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SUMMARY 


T|e  use  of  a computer  simulation  model  may  be  viewed  as  an  experiment 
in  which  a set  of  k input  variables  are  combined  to  produce  at  least  one  output 
or  response  variable.  As  in  a >y  experimental  situation,  the  design  of  a 
computer  simulation  experiment  is  important.  In  general,  not  all  k input  vari- 
ables or  factors  will  be  equally  important  in  their  effect  on  the  response 
variable(s).  It  is  very  common  to  find  that  only  a subset,  say  g < k,  of  the 
original  k factors  are  important  in  explaining  the  response.  We  usually  do  not 
know  the  value  of  g,  or  which  g factors  are  important.. 

The  problem  of  experimentation  and  analysis  to  discover  the  size  and 
composition  of  the  subset  of  active  factors  g is  called  the  factor  screening 
problem.  It  is  important  to  accurately  identify  the  set  of  active  factors. 
Failure  to  identify  an  active  factor  can  result  in  serious  bias  in  the  analysis 
and  conclusions  drawn  from  the  model,  if  that  factor  is  subsequently  ignored. 
Conversely,  experimentation  with  negligible  factors  is  undesirable  as  it  consumes 
the  resources  of  experimentation  needlessly. 

This  report  contains  a survey  of  the  available  statistical  methodology 
useful  in  factor  screening.  It  also  discusses  the  relative  meri  of  each 
approach,  and  provides  guidelines  for  the  development  of  a factor  screening 
strategy.  Several  examples  are  presented  that  demonstrate  the  construction  of 
factor  screening  experiments,  and  the  interpretation  of  the  results  of  such 
experiments.  \ 

Three  typeh^of  factor  screening  situations  may  be  identified.  The  first 
case  is  the  designed  experiment  situation;  that  is,  a situation  in  which  an 
experiment  is  designed  and  conducted  with  the  primary  objective  of  discovering 
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the  set  of  active  factors.  The  use  of  designed  experiments  in  factor  screening 
is  particularly  important,  as  designed  experiments  allow  assessment  of  main 
effects  and  interactions  independent  of  other  effects  that  may  be  present  in 
the  mode.  Designed  experiments  also  often  allow  the  incorporation  of  variance 
reduction  methods.  Finally,  they  usually  admit  a relatively  simple  statistical 
analysis. 

The  major  classes  of  factor  screening  designs  discussed  in  this  report 
Include: 

1.  The  2k~p  and  2k”p  fractional  factorial  designs 

2.  Supersaturated  designs 

3.  Group  screening  designs 

4.  Irregular  fractional  factorials 

A logical  screening  strategy  involving  those  designs  is  developed.  The  selection 
between  designs  is  based  on  consideration  of  the  extent  of  aliasing  of  inter- 
actions and  the  severity  of  assumptions  required  to  produce  a unique  analysis 
of  the  data.  In  particular,  it  is  shown  that  group  screening  followed  by  the 
use  of  a 2k~P  fractional  factorial  design  is  often  an  optimal  screening  approach. 
Variance  reduction  methods  for  these  designs  are  discussed,  based  on  common  and 
antithetic  random  number  streams.  Other  problems  discussed  include  the  compo- 
sition of  the  groups  in  group  screening  and  selecting  levels  for  negligible 
factors  in  subsequent  experiments. 

A second  major  type  of  screening  study  is  the  undesigned  case.  These 
situations  occur  when  there  are  data  available  from  previous  simulation  experi- 
ments with  the  model,  and  decisions  regarding  the  identification  of  active 
factors  must  be  made  using  these  data.  It  is  unlikely  that  these  runs  will 
oonform  to  any  standard  factor  screening  design.  However,  in  these  cases,  the 
method  of  least  squares  can  be  used  to  fit  an  appropriate  regression  model  to 
the  data,  and  factor  screening  decisions  can  often  be  made  using  this  model. 
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The  usual  nonorthogonality  of  such  undesigned  data  makes  the  interpretation  of 
these  models  difficult.  Standardized  regression  coefficients  can  be  used  to 
simplify  the  interpretation,  although  this  still  does  not  solve  the  problems 
created  by  a nonorthogonal  data  set.  Several  measures  of  nonorthogonality 
are  introduced,  including  variance  inflation  factors  and  conditioning  numbers, 
and  the  use  of  these  measures  in  assessing  the  problems  in  interpreting  indi- 
vidual regression  coefficients  is  discussed.  In  cases  of  extreme  nonorthogonality, 
parameter  estimation  methods  other  than  least  squares  are  recommended. 

The  third  type  of  factor  screening  study  involves  augmenting  an  available 
data  set  with  a small  number  of  new  runs.  The  question  of  where  these  addi- 
tional runs  should  be  conducted  is  discussed.  Two  design  augmentation  methods 
are  proposed,  one  based  on  minimizing  the  variance  of  the  parameter  estimates, 
and  the  other  designed  to  minimize  the  bias  resulting  from  factors  thought  to 
be  negligible. 

This  work  was  supported  by  the  Office  of  Naval  Research  (ONR)  under 
contract  N00014-78-C-0312.  I am  grateful  to  Dr.  Thomas  C.  Varley  of  ONR  for  his 
advice  and  encouragement. 
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1.  INTRODUCTION 


1-1.  Uses  of  Simulation 

Many  problems  in  operations  research  are  too  complex  to  be  modeled  and 
analyzed  entirely  by  mathematical  methods.  Computer  simulation  is  widely  used 
in  the  study  of  such  problems.  Typical  problem  areas  in  which  computer  simu- 
lation has  been  successfully  employed  include  queueing,  inventory,  scheduling, 
quality  control/reliability  analysis,  and  maintenance  and  repair  activities. 

The  military  has  made  extensive  use  of  comjMter  simulation  to  analyze  complex 
combat  processes,  as  well  as  supply  and  logistics  activities. 

A computer  simulation  may  be  viewed  as  an  experiment  in  which  a set  of 
controllable  input  or  independent  variables  are  combined  to  produce  at  least 
one  output  variable,  usually  called  the  dependent  variable  or  response.  In 
performing  a computer  simulation  experiment,  the  analyst  will  usually  have 
one  of  two  objectives  in  mind: 

1.  Investigate  the  relationships  between  the  independent  variables  and 
the  response,  determining,  if  possible,  which  factors  exert  the  greatest  effect 
on  the  response,  and  the  extent  of  interaction  between  the  factors. 

2.  Determine  the  set  of  factor  levels  that,  over  some  appropriate  region 
of  Interest,  optimize  the  response(s) . 

As  in  an  experiment,  the  design  of  a computer  simulation  experiment  is 
an  important  aspect  of  the  investigation.  The  use  of  formal  experimental 
design  methods  in  computer  simulation  results  in  significant  advantages  to 
the  analyst.  Including  simplicity  of  data  interpretation  and  (usually)  economic 
efficiency  with  respect  to  the  total  number  of  simulation  runs  required.  For 
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background  reading  in  experimental  design,  consult  Cochran  and  Cox  [1957], 
Davies  ; ! onj,  Hicks  [1973],  Montgomery  [1976],  or  John  [1971].  For  dis- 
cus.:., a of  the  specifics  of  applying  experimental  design  methodology  to  com- 
puter simulation,  see  Burdick  and  Naylor  [1966],  Fishman  I 1973],  Hunter  and 
Naylor  [1970],  Ignall  [1972],  Kleijnen  [1975a,  part  II],  [1977],  and 
Montgomery  and  Evans  [ 1 9 7 5 [ . 

1-2,  The  Need  for  Factor  Screening 

We  shall  assume  that  a computer  simulation  model  may  be  described  by  a 
sec  of  k controllable  input  variables  or  factors.  These  factors  are  generally 
of  two  types: 

1.  Factors  that  are  controllable  or  subject  to  design  in  the  "real 
world"  system  being  modeled,  such  as  inventory  reorder  quantities,  service 
rates,  or  the  rate  of  fire  of  a weapons  system. 

2.  Factors  that  are  not  controllable  in  the  real  system,  such  as 
demand,  weather  effects,  or  the  location  of  enemy  troops  or  equipment.  For 
purposes  of  conducting  the  experiment,  however,  all  k factors  will  be  assumed 
to  be  control lahle  in  the  simulation;  that  is,  we  may  induce  desired  weather 
effects,  or  control  the  movements  of  an  enemy  submarine. 

In  general,  not  all  of  these  k factors  will  be  equally  Important  with 
respect  to  their  effect  on  the  response  variable(s).  The  factors  may  range  in 
importance  from  highly  important  to  negligible.  It  is  very  common  to  find 
that  only  a subset,  say  g < k,  of  the  original  k factors  are  important  in 
explaining  the  response  variable.  However,  generally,  we  do  not  know  the 
value  of  g,  nor  do  we  know  which  g factors  are  important.  This  situation  is 
discussed  by  Jacoby  and  Harrison  [1962],  who  state  that  the  problem  is 
frequently  encountered  in  computer  simulation. 

The  problem  of  experimentation  to  discover  the  size  and  composition 
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of  the  subset  of  or t ive  factors  is  called  the  factor  screening  problem.  It 
is  important  that  the  set  of  active  factors  be  accurately  determined.  Failure 
to  identify  an  active  factor  can  lead  to  serious  bias  in  the  analysis  and 
conclusions  drawn  from  a model,  if  that  factor  is  ignored  in  subsequent  experi- 
ments. On  the  other  hand,  experimentation  with  negligible  factors  is  unde- 
sirable as  it  consumes  the  resources  of  experimentation  needlessly,  and  may 
increase  the  noise  level  in  the  data  to  the  point  when  real  effects  are  more 
difficult  to  discover.  For  example,  many  of  the  optimization  techniques 
applied  to  computer  simulation  models  decrease  rapidly  in  efficiency  as  the 
number  of  independent  variables  increases.  Clearly,  identif icaticn  of  the 
set  of  active  factors  plays  a critical  role  in  the  successful  use  of  this 
methodology. 

Factor  screening  methods  can  be  profitably  employed  at  two  places  during 
the  development  and  use  of  a computer  simulation  model.  They  can  be  employed 
at  the  model  design  and  development  stage.  Applied  at  this  stage,  screening 
methods  can  atfcci  the  choice  variables  used  in  the  model  and  hopefully  ! 

simplify  the  architecture  of  the  final  model.  This  ma-  roquire  expev  iwe;**.atio;i  , 
with  components  or  subroutines  of  the  model,  or,  rhen  practical,  cs  pecirumE  at  ioa 
with  the  real-world  system.  When  used  in  this  manner,  factor  screening  could 
contribute  significantly  to  reducing  the  running  time  of  a simulation  model, 
if  negligible  factors  can  be  identified.  Factor  screening  is  also  applicable 
to  a complete  simulation  model,  although  it  is  unlikely  that  any  major  simpli- 
fication of  the  model  structure  will  result.  However,  the  total  number  of 
computer  runs  that  are  to  be  made  in  exercising  the  model  may  be  substantially 
reduced  if  some  factors  are  net  active. 

This  report  contains  a summary  of  the  available  statistical  methodology 
u.’eiul  in  factor  screening.  It  also  discusses  the  relative  merits  of  each 
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approach,  and  provides  guidelines  for  development  of  a screening  strategy. 

Other  questions,  including  the  implementation  of  variance  reduction  methods, 
choice  of  levels  for  factors  thought  to  be  negligible,  and  pome  details  of 
parameter  estimation  in  linear  statistical  models  are  also  discussed. 

1-3.  Factors,  Levels,  and  Parameter  Estimation 

Suppose  that  Xj,X2,...,xk  are  the  controllable  factors  in  a computer 
simulation  experiment  and  y is  the  (single)  response.  We  assume  that  the 
general  structure  of  the  simulation  is  such  that  it  can  be  expressed  in  the 
form 

y • f (xj^, . . . .x^)  + G.  (1-1) 

In  this  equation,  f is  a functional  relationship  that  determines  the  mean  value 
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of  the  response  y,  and  C is  an  error  term  such  that  E( r ) = 0.  In  faetoi 
screening  problems  it  is  almost  always  sufficient  to  assume  that  i is  lim-ar 
in  the  unknown  parameters  that  relate  the  response  to  the  factors.  Fur 
example,  one  possible  model  would  be 

k 

y = ?0  + 2 + f 

i*  1 

where  . ,S,  are  unknown  parameters. 

To  perform  an  experiment  with  this  system,  we  must  choose  a set  of  values 
or  levels  for  each  factor,  and  then  run  the  computer  simulation  model  at  some 
subset  (or  possibly  the  full  set)  of  the  factor  level  combinations.  The  choice 
of  the  number  of  levels  of  each  factor  and  their  spacing  when  the  factor  is 
continuous  (or  approximately  so)  is  important.  Generally,  we  .should  be  guided 
by  the  information  we  have  about  the  likely  effect  of  that  factor  on  the 
response  y. 

In  most  factor  screening  experiments,  we  are  simply  attempting  to  deter- 
mine the  effect  of  the  factor,  not  necessarily  trying  to  develop  a useful 
predictive  or  interpolative  equation.  Consequently,  a relatively  small  number 
of  factor  levels  is  generally  employed.  Often  two  levels,  arbitrarily  called 
high  and  low,  are  sufficient.  For  example,  in  Figure  1 we  have  illustrated 
the  behavior  of  y as  a function  of  the  factor  x.  Although  y and  x are  related 
in  a complex  nonlinear  manner,  the  use  of  two  levels  for  x will  be  sufficient 
to  measure  the  effect  of  x.  However,  in  cases  where  extreme  curvature  is 
present  in  the  functional  relationship,  more  than  two  levels  will  be  necessary. 
Rarely,  however,  would  more  than  three  or  four  levels  of  the  factor  be 
employed  in  a factor  screening  study.  The  need  for  more  than  a small  number 
of  levels  often  indicates  that  the  region  of  exploration  for  x is  too  large. 
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The  spacing  of  factor  levels  is  also  Important.  Levels  should  be  far 
enough  apart  to  measure  anticipated  effects,  but  not  so  far  as  io  cause  non- 


1 /near i ties  in  the  functional  relationship  to  distort  or  mask  significant 
effects.  For  example,  consider  Figure  2.  If  the  low  and  high  levels  of  x 
are  and  x?,  respectively,  then  (depending  on  the  amount  of  noi  e*  it  is 
highly  unlikely  that  the  effect  of  x on  y will  be  discovered.  On  the  other 
hand,  if  the  low  and  high  levels  are  and  x^,  then  the  curvature  in  the 
functional  relationship  will  likely  mask  the  true  effect  of  x.  ' ‘.e  choice 
of  x^  and  x^  (or  x^  and  x^)  as  low  and  high  Levels  of  x will  reveal  that  x has 
a significant  effect  on  y.  Neither  case,  however,  would  be  sufficient  for 
defining  the  effect  of  x so  that  a predictive  or  interpolative  equation  valid 
over  the  ent ire  range  x^  £ x <_  x^  could  be  developed. 

The  effect  of  a factor  may  be  defined  as  the  change  in  response  y pro- 
duced by  a change  in  the  levels  of  the  factor.  This  is  usually  called  a main 
effect.  For  example,  consider  the  data  in  Table  1,  which  presents  information 
obtained  from  an  experiment  with  two  factors  and  x2.  The  main  effect  of  Xj^ 
is  the  difference  between  the  average  response  at  the  high  level  of  x^  and  the 
average  response  at  the  low ! level  of  x^,  say 

50+20  _ 42+10  _ f 
2 2 


That  Is,  the  average  response  increase  upon  changing  from  the  low  to  the  high 
level  of  Xj  Is  9 units.  Similarly,  the  main  effect  of  x2  Is 


Table  1 
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Data  For  a Factorial  Experiment 


low 


high 


low  high 


10 

42 

20 

50 

The  experimental  design  in  Table  1 is  a factorial  design;  that  is,  a 
design  In  which  all  possible  factor  level  combinations  are  run.  Furthermore, 
there  is  only  one  observation  in  each  cell  (we  say  the  design  is  replicated 
once).  Most  screening  designs  are  factorial  designs. 

Now  consider  the  data  in  Table  2.  Here  the  effect  of  Xj  is 

3W-20  _ 42+10  . _i 
2 T~~ 

which  Implies  that  the  x^  effect  is  small.  However,  inspection  of  Table  2 
reveals  that  the  x^  effect  is  not  negligible,  it  just  depends  on  the  level  of 
factor  For  example,  at  low  Xj  the  x^  effect  is 

20  - 10  - 10 

and  at  high  Xj  the  Xj  effect  is 

30  - 42  - -12. 
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Table  2 


A Factorial  Experiment 


low 

X1 

high 


x2 

low  high 


10 

42 

20 

30 

This  is  an  example  of  an  interaction  between  two  factors.  More 
specifically,  it  is  a two-factor  interaction.  Most  screening  studies  have  to 
make  certain  assumptions  about  the  types  of  interactions  that  are  likely  to  be 
present  In  the  system  in  order  to  design  an  economically  efficient  experiment. 
In  general,  factor  screening  attempts  to  sort  out  the  main  effects  and  low- 
order  interactions  that  drive  the  system. 

The  method  of  least  squares  can  be  used  to  estimate  the  main  effects 
and  Interactions.  Suppo.  * that  we  can  describe  the  system  by  a linear 
statistical  model,  say 


y<  - So  + 


k 

l 

J-l 


6JXiJ 


^ Ci i 1*1 |2| • • • |Q 


(1-3) 


t.here  yj  is  the  ich  response,  Xjj  is  the  i£h  level  of  factor  j,  and  8^, 

J-l,..., k are  unknown  parameters.  Letting  y - (ylfy2, . . . ,y„)  ' , § - (80,B1, . . . ,0k)  ' , 
C “ (c1,e2,. whete  the  prime  denotes  transpose,  and  letting  X denote  an 
nx(k+l)  matrix  whose  first  column  is  all  ones  and  whose  (i,j+l)8t  element  is 
Xjj,  then  it  is  well-known  that  (1-3)  can  be  written  as 
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The  least  squares  estimators  of  f.  are  given  by  the  solution  to  the  normal 
equations 

(x’x)B  - x\,  (1-5) 

or 

6 - (x’x)~lX'y  (1-6) 

assuming  that  (x'x)~*  exists. 

To  illustrate,  consider  the  data  in  Table  1,  and  assume  that  the  high 
and  low  levels  of  x^  and  X2  can  be  represented  by  +1  and  -1,  respectively. 

Then  (1-3)  becomes 


J-t  - Wtl  + B2‘i2  * €i’ 

Ue  have  assumed  that  x^  and  X2  do  not  interact.  Then,  in  matrix  notation, 
we  have  for  (1-4), 
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(x’x)p  - X*y 


r „ i 

B0 

*22 

A 

81 

- 

18 

As 

B2 

62 

» m 

and  the  least  squares  estimates  of  the  parameters  in  the  model  are 


" * 

* m 

So 

30.50 

As 

s, 

4.50 

/2_ 

15.50 

• m 

Note  that  the  least  squares  estimates  of  the  parameters  are  exactly  half  the 
main  effects  of  Xj  and  X£!  that  is, 

- 4.50 
%2  « 15.50 

The  parameter  P^  ■ 30.50  Is  called  the  grand  mean. 

If  we  wished  to  Incorporate  Interaction  into  this  analysis,  we  would 
define  the  model  as 

1 

yi  " B0+Slxil  + 82*i2  + 6i2xilxi2  + Ei’  i*1*2.3*4- 

\ 

It  Is  readily  verified  that 
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V 

20 

1 

1 

r-4 
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el 

G2 

m 

+ 

42 

1-1  1-1 

*2 

C3 

50 

1111 

6, 

e. 

m 

m m 

L 12J 
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and  Che  normal  equations  become 
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From  examining  the  estimates  of  the  effects,  we  conclude  that  both  factors 
exert  large  (positive)  main  effects,  while  the  two-factor  interaction  between 
those  factors  is  negligible. 

Users  of  statistically  designed  experiments  are  accustomed  to  analyzing 
the  resulting  data  by  relatively  formal  methods,  such  as  the  analysis  of 
variance.  In  factor  screening  problems  this  is  usually  not  done  and  the 
least  squares  estimates  of  the  model  parameters  (or  the  effects)  usually  allow 
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significant  factors  to  be  identified.  Often  it  is  not  practical  to  conduct 
a formal  analysis  of  variance  because  of  the  small  number  of  degrees  of 
freedom  that  remain  for  error. 

1-4.  Designed  and  Undesigned  Screening  Experiments 

The  objective  of  a factor  screening  study  is  to  discover  as  much  as 
possible  about  the  factors  that  significantly  affect  the  response.  Designed 
experiments  are  particularly  useful  in  factor  screening,  as  they  allow  assess- 
ment of  effects  and  interactions  independent  of  ocher  effects  present  in  the 
model,  they  often  allow  the  incorporation  of  variance  reduction  methods,  and 
they  usually  admit  a relatively  simple  statistical  analysis.  However,  screening 
is  still  possible  in  the  undesigned  case  such  as  where  there  is  data  available 
from  previous  simulation  runs.  Once  again,  the  method  of  least  _quares  is  useful 
here,  although  the  usual  nonorthogonality  of  such  undesigned  data  makes  the 
interpretation  problem  somewhat  more  difficult.  Section  2 of  this  report  will 
deal  with  designed  screening  studies,  and  Section  3 will  discuss  some  aspects 
of  undesigned  screening  situations,  including  the  intermediate  case  in  which 
some  observations  can  be  added  to  an  existing  data  set. 

In  both  cases,  the  method  of  least  squares  will  be  used  for  parameter 
estimation.  U'e  now  state  some  useful  results  concerning  least  squares  analysis 
of  the  general  linear  model.  The  model  ir 

y - X6  + e , 

where  y is  (n  x 1) , x is  (nxp),  8 is  (pxl),  and  e is  (nxl).  Note  that  the 
number  of  observations  n must  at  least  equal  the  number  of  parameters  p. 

The  least  squares  estimator  of  8 is 
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(1-7) 


8 - (X,X)“lX,y  . 


If  E(r)  ■ 0 and  the  model  is  correct  then  the  least  squares  estimators  are 
unbiased;  that  is 

E(8)  - 8 . 

2 

If  the  errors  are  uncorrelated  with  constant  variance  o then  the  covariance 
matrix  of  the  least  squares  estimator  is 


Cov(8)  - 02(X*X)"1 


(1-8) 


Note  that  the  assumption  of  independent  observations  with  constant  variance 
will  likely  not  hold  in  a simulation  experiment.  In  fact,  there  are  cases 
where  the  choice  of  variance  reduction  strategy  induces  a correlative  structure 
between  the  observations.  In  cases  where  the  assumption  of  uncorrelated 
errors  with  constant  variance  does  not  hold,  the  method  of  weighted  least 
squares  is  useful.  If  V is  a matrix  of  weights  (chosen  proportional  to  the 
variances  and  covariances  of  the  errors)  then  the  weighted  least  squares 
estimator  of  8 is 


&HL5  - (XV^'VV^ 


(1-9) 


flyhS  is  an  unbiased  estimator  for  8 (as  is  6) • The  covariance  matrix  for 
®WLS  13  U-10) 
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(1-11) 


C°v(5wlS)  " <x'V1X)"ia2 

1-5.  Previous  Work  on  Factor  Screening  in  Simulation 

Although  there  is  a substantial  literature  on  frctor  screening,  there 
has  been  little  analysis  or  interpretation  of  this  methodology  in  the  computer 
simulation  environment.  Kleijnen  [l975a,b],  [1977]  and  Hunter  and  Naylor 
[1970]  have  suggested  the  use  if  fractional  factorial  designs  and  group  screening 
(a  procedure  in  which  factors  are  arranged  in  sets)  methods  in  simulation. 

However,  they  do  not  give  any  examples.  Only  Kleijnen  [ 1975b]  attempts  to 
give  any  guidelines  for  the  choice  of  a factor  screening  strategy.  Nolar.  and 
Sovereign  [1972]  employ  a grot  --screening  strategy  in  a large-scale  simulation 
model  of  airlift  and  sealift  . derations.  However,  they  do  not  give  any  details 
of  the  screening  methods  used.  Williams  and  Week3  [1974]  have  proposed  using 
special  types  of  pn  factorial  designs  for  factor  screening  in  simulation.  Their 
methodology  requires  potentially  many  computer  simulations  runs,  and  there  are 
no  examples  or  evaluation  of  their  methodology  given.  In  general,  there  does 
not  presently  seem  to  be  any  systematic  collection  or  evaluation  of  factor 
screening  methods  available,  nor  is  there  much  specific  analysis  of  their  use 
in  computer  simulation.  Some  aspects  of  this  will  be  dealt  with  in  this 
report. 

2.  EXPERIMENTAL  DESIGN  METHODS  IN  FACTOR  SCREENING 
2-1.  Full  Factorial  Designs 

Full  factorial  experiments  could  be  used  for  factor  screening.  The 
most  efficient  design  to  consider  is  the  2^  factorial;  i.e.,  k factors  each 
at  two  levels.  It  is  relatively  standard  practice  to  denote  the  factors  by 
upper  case  letters  such  as  A,  B,  etc.,  rather  than- the  x^,  X2»  etc.  notation 
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used  previously.  The  statistical  model  for  a design  would  include  k main 
k.  k 

effects,  (^)  two-factor  interactions,  ( ^)  three-factor  interactions,  ....  one 

k-faetor  interaction.  That  is,  for  a 2^  design  the  complete  model  would  contain 

2^-1  effects.  Two  systems  of  notation  for  treatment  combinations  are  widely 

5 

used.  For  example,  in  a 2 design  abd  denotes  the  treatment  combination  with 

factors  A,  B,  and  D at  the  high  level  and  factors  C and  E at  the  low  level. 

A system  of  + and  - signs  is  also  useful,  occassionally , where  + denotes  the 

high  level  of  a factor  and  - denotes  the  low  level.  Thus  -H — h-  and  abd  are 

equivalent  notations.  The  treatment  combinations  may  be  written  in  standard 

order  by  introducing  the  factors  one  at  a time;  each  new  factor  being  successively 

/ 

combined  with  those  above  it.  For  example,  the  standard  order  for  a 2 design  is 
(1),  a,  b,  ab,  c,  ac,  he,  abc,  d,  ad,  bd,  abd,  cd,  acd,  bed,  and  abed. 

For  even  a moderate  number  of  factors  the  total  number  of  runs  in 

a 2^  factorial  design  is  large.  For  example,  a 2^  has  32  treatment 
combinations,  a 2^  has  64  treatment  combinations,  and  so  on.  Since  resources 
are  usually  limited,  the  number  of  replicates  that  the  experimenter  can  employ 
may  be  restricted.  Frequently,  available  resources  will  only  allow  a single 
replicate  of  tin*  design  to  be  run,  unless  the  experimenter  is  willing  to  omit 
some  of  the  original  factors.  Most  factor  screening  experiments  would  fall  into 
this  category. 

With  only  a single  replicace  of  the  2 It  is  impossible  to  compute  an 
estijiate  of  experimental  error,  that  is,  a mean  square  for  error.  Thus,  it 
seems  that  hypotheses  concerning  main  effects  and  interactions  cannot  be  tested. 
However,  the  usual  approach  to  the  analysis  of  a single  replicate  of  the  2^  is 
to  assume  that  certain  higher-order  interactions  are  negligible.  The  statistical 
analysis  of  these  designs  is  well-known  (see  John  [1971]  or  Montgomery  [1976]). 
Either  Yates ' tabular  algorithm  or  the  regression  approach  outlined  in  Section 


1 nu>y  bt'  ust»d  to  o.;tunatt*  tiu*  ellVcts.  TIu-  variance*  of  the  estimate  of  any 
ct  h‘ct  is  N 0%  where  N is  the  total  number  ot  observations,  assuming  that 
observations  are  independent.  Note  that  the  regression  treatment  of  the  data 
in  Table  1 is  the  analysis  ot  a 2~  design.  The  smallest  design  for  which  this 

A 

procedure  is  recommended  is  the  2 . 

The  practice  of  combining  higher-order  interaction  mean  squares  to  estimate 
the  error  is  subject  to  criticism  on  statistical  grounds.  If  some  of  these 
interactions  arc  significant,  then  the  estimate  of  error  will  he  inflated.  As 
a result,  other  significant  effects  mav  not  be  detected  and  the  significant 
interactions  used  as  ei ror  will  not  he  discovered.  As  a general  rule,  it  is 
probably  unwise  to  assume  two-factor  interactions  to  he  zero  without  prior  infor- 
mation. If  most  two-factor  interactions  are  small,  then  it  seems  1 ikely  tnat 
all  higher-order  interactions  will  be  significant  also.  (A  word  of  caution 
here — one  does  not  have  to  look  very  far  for  counterexamples  to  these  rules) . 

In  most  factor  screening  studies,  we  will  be  willing  to  assume  that 
certain  h igh-order  interactions  (say  three-factor  and  higher)  are  negligible. 
Considering  the  amount  of  information  provided  by  a 2^  factorial,  this  is 
probably  reasonable.  For  example,  consider  a 2^.  The  32  observations  allow 
31  effects  to  be  estimated: 

5 main  effects 
10  2 factor  interactions 

10  3 factor  interactions 

5 4 factor  interactions 

1 5 factor  interactions 

In  many  situations,  out  interest  would  be  confined  to  detecting  main  effect 
and  the  2-factor  interactions.  Thus  we  could  either  use  the  16  higher-order 
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effects  as  an  estimate  of  error,  or  as  the  basis  of  developing  a more  efficient 
design  via  fractional  replication. 

When  a large  number  of  effects  are  estimated,  we  may  wish  to  find  some 
formal  basis  for  declaring  which  effects  are  significant.  If  there  is  either 
replication  or  Insignificant  factors  pooled  to  estimate  error,  we  could  possibly 
use  analysis  of  variance  methods  and  conduct  formal  statistical  tests.  However, 
if  variance  reduction  methods  such  as  common  random  numbers  have  been  used, 
the  usual  analysis  of  variance  statistical  tests  may  not  be  appropriate.  For 
a discussion  of  this  problem  is  simple  designs,  see  Heikes,  Montgomery,  and 
Rardin  [1976].  A useful  approach  is  to  plot  the  effects  on  normal  pro- 
bability paper.  Negligible  effects  on  such  a display  ’.ill  fall  approximately 
al< — a straight  line,  while  real  effects  will  lie  far  from  the  line.  For 
examples  of  this  methodology  in  a general  experimental  design  setting,  see 
Montgomery  [1976].  We  will  illustrate  the  approach  in  subsequent  examples. 

L 

The  2 factorial  series  has  a projection  property  useful  in  factor 
screening.  For  example,  consider  the  2-*  design  in  Figure  3.  If  factor  A is 
negligible,  we  can  collapse  the  8 runs  from  the  2^  in  factors  A,  B,  and  C into 
two  replicates  of  a 2^  in  factors  B and  C.  In  general,  if  we  have  a single 
replicate  of  a 2k  and  h(<k)  factors  can  be  dropped  because  they  sean  negligible, 
then  the  remaining  data  will  always  correspond  to  2*1  replicates  of  a full 
factorial  in  the  remaining  k-h  factors.  These  replicated  design  points  can  be 
used  to  obtain  an  estimate  of  error. 

Full  2 factorial  are  advantageous  in  screening  in  that  they  potentially 
produce  all  of  the  information  required  to  identify  significant  effect  and 
interactions.  However,  there  are  more  resource-efficient  methods  that  can 
produce  equivalent  Information. 
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2-2.  The  2k"P  Fractional  Factorial  Design 
2-2.1  General  Results 

k 

As  the  number  of  f_ctors  in  a 2 factorial  design  increases,  the  number 
of  runs  required  for  a complete  replicate  of  the  design  rapidly  outgrows  the 
resources  of  most  experimenters.  A complete  replicate  of  the  2^  design  requires 
64  runs.  In  this  design  only  6 of  the  63  degrees  of  freedom  correspond  to  main 
effects,  and  only  15  degrees  of  freedom  correspond  to  two-factor  interactions. 

The  remaining  42  degrees  of  freedom  are  associated  with  three-factor  and  higher 
interactions. 

If  the  experimenter  can  reasonably  assume  that  certain  high-order  inter- 
actions are  negligible,  then  information  on  main  effects  and  low-order  interactions 
may  he  obtained  by  running  only  a fraction  of  the  complete  factorial  experiment. 
These  fractional  factorial  designs  are  widely  used  in  industrial  research,  and 
have  major  applications  in  factor  screening.  For  a general  introduction  to 
the  construction  and  elementary  properties  of  these  designs  refer  to  Montgomery 
f 1976,  ch.  10]  or  Box  and  Hunter  f 1961 ] - 

In  a 2k-P  fractional  factorial  design,  only  a fraction  of  the  2k  treatment 

L 

combinations  arc  actually  run.  Specifically,  a fraction  of  the  2 design 
containing  2k“P  runs  is  called  a 1/2P  fraction  of  the  2*1,  or,  more  simply,  a 
2*c~*>  fractional  factorial  design.  The  designs  discussed  in  this  section  are 
regular  fractions,  that  is,  estimates  of  the  effects  are  orthogonal.  The  effects 
may  be  estimated  by  Yates'  algorithm  (John  [1976],  Daniel  fl977]r  Montgomery 
[1976])  or  by  generating  the  contrast  for  any  factor  using  the  table  of  + and 
- signs  for  that  design  (which  is  equivalent  to  the  regression  approach  out- 
lined in  Section  1).  The  variance  of  the  estimate  of  any  effect  Is  2p-ko^. 

There  are  several  methods  of  constructing  these  designs.  One  method  of 
constructing  a 2k“P  fractional  factorial  design  is  to  select  p independent 
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generators  (no  chosen  generator  is  a generalized  Interaction  of  the  others), 
constructing  the  2^  blocks  associated  wi~'.»  those  generators,  and  then  selecting 
one  block  as  the  fractional  design.  The  defining  relation  for  the  design 
consists  of  the  p generators  initially  chosen  and  their  2^  - p - 1 generalized 
interactions. 

The  alias  structure  may  be  found  by  multiplying  each  effect  modulus  2 by 
the  defining  relation.  Care  should  be  exercised  in  choosing  the  generators  so 
that  effects  of  potential  interest  are  not  aliased  with  each  other.  Each  effect 
has  2P  - 1 alias  es.  In  most  factor  screening  studies  we  assume  higher-order 
interactions  (say  third-  or  fourth-order  and  higher)  to  be  negligible,  and  this 
greatly  simplifies  the  alias  structure. 

/ second  method  of  design  construction  is  to  consider  the  2 design  as 

a full  factorial  In  h ■ k-p  factors.  Then  the  table  of  +•  and  - signs  for  the 

full  2 design  is  written  down,  and  the  additional  p factors  added  by  equating 

It 

their  factor  levels  with  the  products  of  certain  factor  levels  in  the  full  2 . 

6—2  6 
As  an  example,  consider  the  2 x design.  This  is  a 1/4  fraction  of  a 2 , con- 

6-2  4 4 

tainlng  2 ■ 2 ■ 16  rows.  To  construct  this  design  form  a 2 design  in  the 

factors  A,  B,  C,  and  D,  as  shown  in  the  left-hand  panel  of  Table  3.  Two  columns 

i 

must  be  added  to  incorporate  the  fifth  and  sixth  factors,  E and  F.  These  factor 
levels  are  found  in  the  center  panel  of  Table  3,  by  equating  E " ABC  and  F - ACD. 
Note  that  this  is  equivalent  to  choosing  generators  I » ABCE  and  I • ACOF  and 
using  the  first  procedure  described  above  to  construct  the  design.  The  treatment 
combinations  are  shown  in  the  right-hand  panel  of  Table  3. 

Since  the  generators  of  this  design  are  I - ABCE  and  I - ACDF  and  the 

generalized  interaction  of  the  generators  ABCE  and  ACDF  is  BDET , the  complete 
defining  relation  for  this  design  is  1 - ABCE  - ACDF  - BDEF.  To  find  the 
1 . aliases  of  any  effect  multiply  that  effect  by  each  word  in  the  defining  relation. 

[ 

I 


/ 
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Table  3 


r 

i 

i 

i 

t 2 

Construction  of  the  2 Design  With  Generators  I » ABCE  and  I * ACDF 


Treatment 

A B C D E - ABC  F ■ ACD  Combination 


For  example,  Che  alias  of  A is 


A - BCE  - CDF  - ABDEF 


It  is  easy  to  verify  that  every  main  effect  is  aliased  by  three-factor  and 
five-factor  interactions,  while  two-factor  interactions  are  aliased  with  each 
other  and  with  higher-order  interactions.  Thus,  when  we  estimate  A,  for 
example,  we  are  really  estimating  A + BCD  + CDF  + ABDEF.  The  complete  alias 
structure  is  shown  in  Table  4.  If  three-factor  and  higher  interactions  are 
negligible,  this  design  gives  clear  estimates  of  main  effects. 

The  2k_P  fractional  factorial  design  has  the  projection  property  noted 
previously  for  the  full  2^  design.  In  general,  say  fractional  factorial 

design  can  be  projected  into  either  a full  factorial  or  a replicated  fractional 
factorial  in  some  subset  of  r * k-p  of  the  original  factors.  Those  subsets  of 
factors  providing  fractional  factorials  are  subsets  appearing  as  words  in  the 
complete  defining  relation.  This  Is  particularly  use'ul  in  screening  experiments 
when  we  suspect  at  the  outset  of  the  experiment  that  most  of  the  original  factors 
will  have  small  effects.  The  original  2k-p  fractional  factorial  can  then  be 
projected  into  a full  factorial  (say)  in  the  most  interesting  factors. 

For  example,  the  2 fractional  factorial  will  collapse  to  a single 
replicate  of  a 2*  design  in  any  subset  of  four  factors  that  is  not  a word  in 
the  defining  relation.  It  will  also  collapse  to  a replicated  one-half  fraction 

4 

of  a 2 in  any  subset  of  four  factors  that  is  a word  in  the  defining  relation. 

4-1 

Thus,  the  design  in  Table  3 becomes  two  replicates  of  a 2 in  the  factors 
ABCE,  ACDF,  and  BDEF,  since  these  are  the  words  in  the  defining  relation. 

There  are  12  other  combinations  of  the  six  factors,  such  as  ABCD,  ABCF,  and  so 
on,  for  which  the  design  projects  to  a single  replicate  of  the  2^.  This 
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Table  4 


Alias  Structure  for  the  26-2  DeslRn  With  I - ABCK  - ACDF  - BDEF 


Effect 

A1  ias 

A 

BCE 

CDF 

ABDEF 

B 

ACE 

DEF 

ABCDF 

C 

ABE 

ADF 

BCDEF 

D 

ACF 

BEF 

ABCDE 

E 

ABC 

BDF 

ACDEF 

F 

ACD 

BDE 

ABCF.F 

AB 

CE 

BCDF 

ADEF 

AC 

BE 

DF 

ABCDEF 

AD 

CF 

BCDF. 

ABDF 

AF. 

BC 

CDEF 

ABDE 

AF 

CD 

BCEF 

ABDE 

BD 

EF 

ACDE 

ABCF 

BF 

DE 

ABCD 

ACEF 

ABF 

CEF 

BCD 

ADE 

CDE 

ABD 

AEF 

CBF 

24 


*1 

design  will  also  collapse  to  two  replicates  of  a 2 in  any  subset  of  three  of 
the  six  factors  or  four  replicates  of  a 2^  in  any  subset  of  two  factors. 

To  present  a fractional  factorial  for  which  the  projection  property  can 
be  visually  demonstrated,  co  isider  the  1/2  fraction  of  the  2 with  generating 
relation  I * ABC.  This  could  also  be  denoted  as  a 2^“^  design.  The  design  is 
shown  in  Table  5.  The  projection  of  this  design  into  a full  2^  factorial  is 
accomplished  by  eliminating  one  of  the  original  three  factors.  This  is  illustrated 
in  Figure  A. 

2-2.2  Resolution  III  Designs 

It  is  useful  to  classify  2k”P  fractional  factorial  designs  according  to 
their  resolution.  The  system  is  as  follows: 

(i)  Resolution  III  Designs.  These  are  designs  in  which  no  main  effects 
is  aliased  with  any  other  main  effect,  but  main  effects  are  aliased 
with  two-factor  interactions  and  two-factor  interactions  are  aliased 
with  each  other.  The  2 design  in  Table  5 is  of  resolution  III. 

(ii)  Resolution  IV  Designs.  These  are  designs  in  which  no  main  effect  is 
aliased  with  any  other  main  effect  or  two-factor  interaction,  but 
two-factor  interactions  are  aliased  with  other.  The  24-*  design 
with  I ■ ABCD  is  of  resolution  IV. 

(ill)  Resolution  V Designs.  These  are  designs  in  which  no  main  effect 
or  two-factor  interaction  is  aliased  with  any  other  main  effect 
or  two-factor  interaction,  but  two-factor  interactions  are  aliased 
with  three-factor  interactions.  A 25'1  design  with  I • ABCDC  is 
of  resolution  V. 

In  general,  the  resolution  of  a design  is  equal  to  the  smallest  number  of 
letters  in  any  word  in  the  defining  relation.  Consequently  some  authors  refer 
to  these  plans  as  three-letter,  four-letter,  and  five-letter  designs,  respectively. 
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Table  5 


The  23'1 

Design  With 

I - ABC 

A 

B 

C-AB 

Trea tment 
Combinations 

- 

- 

+ 

c 

+ 

- 

- 

a 

- 

+ 

- 

b 

+ 

+ 

+ 

abc 

Ue  can  show  Chat  a design  is  of  resolution  (2t+l)  if  we  can  estimate  effects 

of  order  t when  effects  of  order  higher  than  t are  negligible  Roman  numeral 

3—1 

subscripts  are  used  to  identify  the  resolution  of  a design.  Thus,  a 2^^  design 
3-1 

is  a 2 design  of  resolution  III.  For  the  more  highly  fractionated  designs, 
■ore  extensive  assumptions  are  required  to  draw  conclusions  from  the  data. 

Resolution  III  and  IV  designs  are  particularly  useful  in  factor  screening 

k— p 

studies.  This  section  will  discuss  the  2 " design,  tfe  may  construct  resolution 

III 

III  designs  for  investigating  up  to  k ■ N - 1 factors  in  N runs,  where  N is  a 
Multiple  of  4.  Designs  in  which  N is  a power  of  2 can  be  constructed  by  the 
methods  presented  previously.  Of  particular  Importance  are  designs  requiring 
4 runs  for  up  to  3 factors,  8 runs  for  up  to  7 factors,  16  runs  for  up  to  15 
factors,  and  32  runs  for  up  to  31  factors.  If  it  » N - 1 the  fractional  factorial 
design  is  said  to  be  saturated. 

3-1 

A design  for  analyzing  up  to  three  factors  in  four  runs  is  the  2 

design,  presented  in  Table  5.  Another  very  useful  saturated  fractional  factorial 

7-4 

is  a design  for  studying  seven  factors  in  eight  runs;  that  is,  the  2^^  design. 
This  design  is  a one-sixteenth  fraction  oi  the  2^.  It  may  be  constructed  by 

3 

first  writing  down  the  plus  and  minus  levels  for  a full  2 in  A,  B,  and  C,  and 


27 


then  generating  the  levels  of  four  additional  factors  using  the  interactions 
of  the  original  three  as  follows:  D • AB,  E 3 AC,  F = BC,  and  G * ABC.  Thus, 
the  generating  relations  for  this  design  are  I = ABD,  T = ACE,  I m BCF,  and 
I - AI1CF.  The  design  is  shown  in  Table  6. 

Table  f 

The  -jjj  Design  With  Generators  I = ABD,  I * ACE, 

I = BCF,  and  I = ABCF 


A 

B 

C 

D=AB 

E=AC 

F=BC 

G=ABC 

- 

- 

- 

+ 

+ 

+ 

- 

def 

+ 

- 

- 

- 

- 

+ 

+ 

afg 

- 

+ 

- 

- 

+ 

- 

+ 

beg 

+ 

+ 

- 

+ 

- 

- 

- 

abd 

- 

- 

+ 

+ 

- 

- 

+ 

cdg 

+ 

- 

+ 

- 

+ 

- 

- 

ace 

- 

+ 

+ 

- 

- 

+ 

- 

bcf 

+ 

+ 

+ 

+ 

+ 

+ 

+ 

abcdefg 

The  complete  defining  relation  for  this  design  is 


I = ABD  =*  ACE  - BCF  - ABCC  - BCDE  - ACDF  - CDG  - ABEF  - BEF 
= AFG  = DEF  = ADEC  * CEFG  =»  BDFG  - ABCDEFG 


This  design  is  a one-sixteenth  fraction,  and  since  the  signs  chosen  for 

the  generators  are  positive,  this  is  the  principal  fraction.  It  is  also  of 

resolution  III,  since  the  smallest  number  of  letters  in  any  word  of  the  defining 

7-4 

contrast  is  three.  Any  one  of  the  16  different  2^^^  designs  could  be  constructed 
by  using  the  generators  with  one  of  the  16  possible  arrangements  of  signs  in 
I ■ + ABD,  I - +ACE,  I ■ +BCF,  I “ +ABCG.  All  of  these  designs  would  be  said 
to  belong  to  the  same  family. 

The  eight  runs  in  this  design  may  be  used  to  estimate  the  seven  main 
effects.  These  estimates  are  obtained  as  linear  combinations  of  the  observations, 
where  the  signs  in  a particular  linear  combination  are  given  in  the  associated 
column  of  Table  6.  Thus,  to  estimate  A,  use  the  plus  and  minus  signs  in  the 
A column.  Each  of  these  effects  has  IS  aliases;  however,  if  we  assume  that 
three-factor  and  higher  interactions  are  negligible,  then  considerable  simplifi- 
cation in  the  alias  structure  results.  Making  this  assumption,  each  of  the  linear 
combinations 


t A - A + BD  + CE  + FG 
fg-B+AD+CT+EG 
fc-C+AE+BF+DG 

fD  - D + AB  ♦ CG  + EF  __ 

fg-E+AC+BG+DF 

fp-F+BC+AG+PE 

(c  - G + CD  + BE  + AF 

where  refers  to  the  linear  combinations  of  treatment  combinations  given  by 
column  i in  Table  6. 

7-4 

The  saturated  2^^  design  in  Table  6 can  be  used  to  obtain  resolution 
III  designs  for  studying  fewer  than  seven  factors  in  eight  runs.  For  example. 


Co  generate  a design  for  six  factors  in  eight  runs,  simply  drop  any  one  column 
in  Table  6,  for  example,  column  C.  This  produces  the  design  shown  in  Table  7. 

Table  7 


A 2^”^  Design  With  Generators  I 
III 

- ABD, 

I - ACE, 

and  I - BCF 

A 

B 

C 

D-AB 

E-AC 

F*BC 

- 

- 

- 

+ 

+ 

+ 

def 

♦ 

- 

- 

- 

- 

+ 

a£ 

- 

+ 

- 

- 

+ 

- 

be 

+ 

+ 

- 

+ 

- 

- 

abd 

- 

- 

+ 

+ 

- 

- 

cd 

V 

- 

+ 

- 

+ 

- 

ace 

- 

+ 

♦ 

- 

- 

+ 

bcf 

+ 

+ 

+ 

+ 

+ 

+ 

abcdef 

It  la  easy  to  verify  that  this  is  a 2^^  design  or  a one-eighth  fraction 

of  the  2 , The  defining  relation  for  the  2®”^  design  is  equal  to  the  defining 

7-4 

relation  for  the  original  2jjj  design  with  any  words  containing  the  letter  G 
deleted.  Thus,  the  defining  relation  for  this  design  is 

I - ABD  - ACE  - BCF  - BODE  - ACDF  - ABEF  - DEF 


In  general,  when  d factors  are  dropped  to  produce  a new  design,  the  new  defining 
relation  is  obtained  as  those  words  in  the  original  defining  relation  that  do 
not  contain  any  dropped  letters.  When  constructing  designs  by  this  method, 
care  must  be  taken  to  obtain  the  best  design.  If  we  drop  columns  B,  D,  F,  and 
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G from  Table  6,  we  obtain  a design  for  three  factors  in  eighc  runs,  yet  the 


treatment  combinations  correspond  to  two  replicates  of  a 2 . The  experimenter 

3 

would  probably  prefer  to  run  a full  2 design  in  A,  C,  and  E. 

It  is  also  possible  to  obtain  a resolution  III  design  for  studying  up 
to  15  factors  in  16  runs.  This  saturated  2*^  design  can  be  generated  by 

first  writing  down  the  16  treatment  combinations  associated  with  a 2^  in  A,  B, 

C,  and  D,  and  then  equating  11  new  factors  with  the  2,  3,  and  4-factor  inter- 

3 1 “ 2 6 

actions  of  the  original  A.  A similar  procedure  can  be  used  for  the  2^ 

design,  which  allows  up  to  31  factors  to  be  studied  in  32  runs. 

By  combining  fractional  factorial  designs  in  which  certain  signs  are 

switched,  we  can  systematically  isolate  effects  of  potential  interes'  The 

alias  structure  for  any  fraction  with  the  signs  for  one  or  more  factors 

reversed  is  obtained  by  making  changes  of  sign  on  the  appropriate  factors  in 

the  alias  structure  of  the  original  fraction. 

7-4 

Consider  the  2 ^ design  in  Table  6.  Suppose  that  along  with  this 
principal  fraction  a second  fractional  design  with  the  signs  reversed  in  the 
column  for  factor  D is  also  run.  That  is,  the  column  D in  the  second  fraction 

is 


- + + + +- 


The  effects  that  may  be  estimated  from  the  first  fraction  are  shown  in  (2-1)  and 
from  the  second  fraction  we  obtain 
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1 


- A - BD  + CE  + FG 
f*-B-AD+CF  + EG 

D 

-f*»C  + AE+BF-DG 

- -D  + AB  + CG  + EF 

■£*  - K + AC  + BG  - DF 

£p  ' F + BC  + AG  - DE 

l*  - G - CD  + BE  + AF 

U 


assuming  chat  three-factor  and  higher  Interactions  are  insignificant.  Now 
from  the  two  linear  combinations  of  effects  + £*)  sod  ^^i  ~ we 

obtain 


i 

From  + t *) 

From 

-<> 

A 

A + CE  + FG 

BD 

B 1 

B + CF  + EG 

AD 

C 

C -f  AE  + BF 

DC  ! 

D 

AB  + CG  + EF 

D 

E 

E + AC  + BG 

DF  < 

j 

F 

F + BC  ♦ AG 

DE  | 

G 

G + BE  + AF 

/ 

CD  J 

L 

0 


Thus  we  have  Isolated  the  main  effect  of  D and  all  of  its  two-factor 
interactions.  In  general,  if  we  add  to  a fractional  factorial  design  of 
resolution  III  or  higher  a further  fraction  with  the  signs  of  a single  factor 
reversed,  then  the  combined  design  will  provide  estimates  of  the  main  effect 
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of  that  factor  and  its  two-factor  interactions. 

Now  suppose  we  add  to  any  fractional  factorial  design  a second  fraction 

in  which  the  signs  for  all  factors  are  reversed.  This  procedure  breaks  the  alias 

links  between  main  effects  and  two-factor  interactions.  That  is,  we  may  use 

the  combined  design  to  estimate  all  main  effects  clear  of  any  two-factor  inter- 

7—4 

actions.  For  example,  suppose  we  added  to  the  2'^j  design  in  Table  6 the 
second  fraction  shown  in  Table  8. 


Table  8 

A 2^  Design  With  All  Signs  Switched 


A 

B 

C 

D-AB 

E-AC 

F-BC 

G-ABC 

+ 

♦ 

♦ 

- 

- 

- 

+ 

a beg 

- 

+ 

♦ 

+ 

+ 

- 

- 

bede 

+ 

- 

-*■ 

+ 

- 

+ 

- 

aedf 

mm 

- 

+ 

- 

+ 

♦ 

♦ 

cefg 

+ 

♦ 

- 

- 

+ 

+ 

- 

abef 

- 

♦ 

- 

♦ 

- 

■f 

+ 

bdfg 

♦ 

- 

■f 

♦ 

- 

♦ 

adeg 

- 

- 

- 

- 

- 

- 

- 

(1) 

The  effects  that  may  be  estimated  from  this  fraction  are 

^•-A+BD+CE+FC 
f£--B  + AD  + CF  + EC 
f£--C  + AE  + BF  + DG 
f£--0  + AB  + CG  + EF 
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r 

< 

4 r 


£*  - -E  + AC  + BG  + DF 
iy  - -F  + BC  + AG  + DE 
it  - -C  + CD  + BE  + AF 


Upon  combining  the  two  fractions  and  forming  the  linear  combinations 
+■  i*)  and  ~(i i - /*)  , we  obtain 


i 

From  j(i{  + t*) 

From 

-<> 

A 

BD  + CE  + FG 

A 

. . 

B 

AD  + CF  + EG 

B 

C 

AE 

+ BF  + DG 

C 

»• 

D 

AB 

+ CG  + EF 

D 

V* 

E 

AC 

+ BG  + DF 

E 

i. 

F 

BC 

+ AG  + DE 

F 

r 

i . 

G 

CD  + BE  + AF 

G 

Therefore  clear  estimates  of  all  main  effects  and  the  two-factor  interaction 
alias  groups  are  obtained. 

The  designs  due  to  Plackett  and  Burman  [1946]  are  also  two-level 
Resolution  III  fractional  factorials.  These  designs  can  be  used  for  studying 
k • N - 1 variables  in  N runs,  where  N is  a multiple  of  4.  If  N is  a power  of 
2,  these  designs  are  identical  to  those  presented  earlier  in  this  section. 
However,  for  N * 12,  20,  24,  28,  and  36  the  Placket t-Burman  designs  are  fre- 
quently useful. 

1 The  upper  panel  of  Table  9 presents  rows  of  plus  and  minus  signs  used  to 

i . 

construct  the  Pleckett-Burman  designs  for  N ” 12,  20,  24,  and  36,  while  the 

u 
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lower  panel  of  Che  Cable  presence  blocks  of  plus  and  minus  signi  for  consCrucdng 
Che  design  for  N ■ 28.  The  designs  for  N * 12,  20,  24,  and  36  are  obcained  by 
vrldng  Che  approprlaCe  row  In  Table  9 as  a column.  A second  column  is  then 
generaCed  from  this  first  one  by  moving  the  elements  of  Che  column  down  one 
position  and  placing  the  la3t  element  in  the  first  position.  A third  column 
Is  produced  from  the  second  similarly,  ->nd  the  process  continued  until  column  k 
is  generated.  A row  of  minus  signs  is  then  added,  completing  the  design.  For 
N * 28,  the  three  blocks  X,  Y,  and  Z are  arranged  as 

X Y Z 

Z X Y 

Y Z X 


and  a row  of  minus  signs  added  to  these  27  rows.  The  design  for  N ■ 12  runs 
and  k * 11  is  shown  in  Table  10. 

The  alias  structure  of  the  Plackett-Burman  designs  Is  complex.  In 
general,  all  two-factor  interactions  not  Involving  factor  Q (say)  are  aliased 
with  the  estimate  of  Q.  For  example,  in  the  11  factor  plan  shown  in  Table  10, 
each  main  effect  is  aliased  with  43  two-factor  interactions,  and  each  two- 
factor  Interaction  appears  In  9 of  the  Jl  estimates  of  main  effects.  This  is 
somewhat  less  troublesome  if  fewer  than  11  factors  are  considered.  Further- 
more, the  two-factor  interactions  could  possibly  be  untangled  by  adding  a 
second  fraction  with  all  signs  reversed,  provided  that  only  a few  of  them 
were  large. 

EXAMPLE  1,  We  shall  now  illustrate  some  of  the  above  ideas  with  an  example. 
The  problem  setting  is  inventory  control,  and  we  wish  to  determine  the  effect 
of  various  parameters  on  the  average  annual  cost.  We  note  that  simulation 
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Table  9 


Plua  and  Minus  Signs  for  the  Plackett-Burvan  Designs 


Table  10 


methods  are  not  required  for  this  problem,  as  there  are  analytical  models  that 
can  be  used  to  describe  the  system.  However,  the  problem  has  been  kept  simple 
deliberately  to  Illustrate  the  experimental  methods. 

There  are  three  items  in  the  inventory.  These  items  are  military  belts 
such  as  used  in  jeans  and  other  casual  apparel.  Item  1 is  hardware,  item  2 is 
dyed  webbing,  and  item  3 is  natural  webbing.  The  following  quantities  are 


Item  1 

Item  2 

Item  3 

Annual  Demand  (D) 

500,00  doz. 

300,000  doz. 

200,000  doz. 

Demand  during  a 

Lead  time 

- 20,000 

y2  ” 6,000 

y3  - 4,000 

X ~ N(y,o2) 

- 3,000 

o2  - 900 

o3  - 600 

Lead  Time  T 

2 weeks 

1 week 

1 week 

Fixed  Cost  A 

$35  per  order 

$15  per  order 

$15  per  order 

Unit  Var.  cost  C 

$6.25/doz. 

$3.10/doz. 

$2.80/doz. 

Carrying  cost  h 

$.20 

$.28 

$.28 

Cost  per  unit  short  it 

* 

$.40 

$.40 

The  following  variables  represent  parameters  that  we  would  like  to  investigate 
to  learn  their  effect  on  the  system: 


Variable 

Level 

Item  1 

Item  2 

Item  3 

Order  quantity  Q 

1 

10,000 

4,000 

3,000 

2 

20,000 

8,000 

6,500 

Reorder  point  r 

1 

17,000 

5,000 

3,500 

2 

35,000 

11,000 

7,000 

Cost  per  unit  it* 

1 

.3 

short 

2 

.5 

/ 
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Note  that  there  are  seven  factors,  each  at  two  levels.  The  2^.  design 
in  Table  6 is  run,  using  the  high  and  low  levels  of  these  factors  shown  above. 
Let  factors  A,  B,  and  C denote  the  order  quantities  for  items  1,  2,  and  3;  D, 

E,  and  F denote  the  reorder  points  for  items  1,  2,  and  3;  and  G denote  the 
shortage  cost  for  item  1.  From  the  design  in  Table  6,  we  obtain  the  following: 


Treatment 

Combination 

Response 
$ X 1000 

Effect  + Aliases  (2-1) 

Estimate 

(def) 

4,626 

A 

-65 

afg 

4,693 

B 

50 

beg 

4,718 

D 

-180 

abd 

4,655 

C 

-66 

cdg 

4,662 

E 

-72 

ace 

4,653 

F 

-58 

bcf 

4,685 

G 

80 

abcdefg 

4,626 

Obviously,  the  effect  of  D (and  its  aliases)  is  large.  Since  this  is  the  only 
large  effect,  we  might  stop  and  conclude  that  over  the  range  of  variation,  that 
only  item  l*s  reorder  point  seriously  affects  the  system.  However,  to  be  more 
certain  of  these  results,  we  run  the  alternate  fraction  given  in  Table  8.  This 
gives  the  following: 


?9 


Treatment 

Combination 


Response 
$ X1000 

4,683 

4,632 

4,656 

4,704 

4,647 

4,640 

4,640 

4,716 


Effect  + Aliases  (2-2) 


Estimate 


Combining  the  results  from  the  two  fractions,  we  obtain 


From  + l *) 

BD  + CE  + FG  « 


From  - £*) 


A - -65 


AD  + CF  + EG  « 82 

AE  + BF  + DG  - -49 


B - -32 
C - -17 


AB  + CG  + EF 


D » -181 


AC  + BG  + DF 


E - -72 


BC  + AG  + DE  - -17 


F - -41 


CD  + BE  + AF  - 32 


Clearly  the  main  effect  of  D is  large.  Since  the  effect  of  D is  over 
twice  as  large  as  the  next  largest  effect,  we  are  tempted  to  conclude  that  it 
is  the  only  significant  factor.  This  is  confirmed  by  viewing  the  normal 
probability  plot  of  the  estimates  of  the  effects.  Figure  5.  Point  1 on  this 
plot  is  D.  It  is  significantly  off  the  straight  line  formed  by  the  other 
effects.  We  conclude  that  only  the  effect  of  D is  significant. 
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5.  Normal  Probability  PI 
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2-2.3  Resolution  IV  Designs 
k— n 

A 2 v fractional  factorial  design  is  of  resolution  IV  if  main  effects 
are  clear  of  two-factor  interactions  and  some  two-factor  interactions  are 
aliased  with  each  other.  Thus,  if  three-factor  and  higher  interactions  are 
suppressed,  main  effects  may  be  estimated  directly  in  a 2^“P  design.  The 


,6-2 


design  in  Table  3 is  of  resolution  IV.  Furthermore,  the  two  combined 


fractions  of  the  2^~^  design  in  Example  1 is  a 2^“^  design. 

Any  2^^  design  must  contain  at  least  2k  runs.  Resolution  IV  designs 

that  contain  exactly  2k  runs  are  called  minimal  designs.  Resolution  IV  designs 

may  be  obtained  from  resolution  III  designs  by  the  process  of  fold  over.  To 

fold  over  a 2 1 design  simply  add  to  the  original  fraction  a second  fraction 

III 

with  all  signs  reversed.  Then  the  plus  signs  in  the  identity  column  I in  the 

^ £ 

first  fraction  are  switched  in  the  second  fraction,  and  a (k+1)  factor 

associated  with  this  column.  The  result  is  a fractional  factorial 

IV 

3-1 

design.  The  process  is  demonstrated  in  Table  11  for  the  2^^  design.  The 

4-1 

resulting  design  is  a 2 design  with  generating  relation  I = ABCD. 


Table  11 

A 2^~^  Design  Obtained  by  Fold  Over 


D 

I 

A 

B 

C 

1-1 

Original  2^  I - ABC 

+ 

“ 

+ 

+ 

+ 

- 

- 

+ 

- 

+ 

- 

+ 

+ 

+ 

+ 

3-1 

Second  1 with  signs  switched 

— 

+ 

+ 

— 

hi 

- 

- 

+ 

+ 

- 

+ 

- 

+ 

- 

- 

- 

- 
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As  a second  example  of  fold-over,  consider  the  2 design  used  in 

Example  1 (also  see  Table  6) . By  adding  to  the  design  the  fraction  in  Table  8 

and  associating  an  8^  H factor  with  the  column  I ■ + in  Table  6 and  I - - in 

Table  8,  we  would  have  a 2^  ^ plan.  The  generating  relation  for  this  design  is 

IV 

I - ABDH  = ACEH  =■  3CFH  = ABCG  = BCDE  = ACDF  = CDGH 

* ABEF  = BEGH  = AFGH  = DEFH  = BDFG  * ADEG  = CEFG 

**  ABDEFGH. 


The  generator  of  the  new  design  will  consist  of  all  generators  from  the  old 

design  that  contain  an  even  number  of  letters  and  all  generators  from  the  old 

design  that  contain  an  odd  number  of  letters  will  have  the  new  letter  added. 

3 

Any  resolution  IV  design  will  contain  a 2 complete  factorial  design. 

That  is,  it  will  provide  r replicates  of  a T?  design  any  3 of  the  original  factors 
provided  the  design  contain  r2J  points.  Thus  the  2°y  plan  above  provides  two 
replicates  of  a 2 in  any  subset  of  3 of  the  original  8 factors.  This  often  has 
important  applications  in  screening. 

EXAMPLE  2.  Consider  the  inventory  problem  in  Example  1.  We  will  fold  over  the 
original  2^  ^ design  in  this  example,  giving  a 2®~^  plan,  with  the  8*-*1  factor 
taken  to  be  the  mean  cc  the  lead  time  demand  distribution  for  item  1.  In  the 
first  fraction  the  mean  is  20,000,  while  in  the  second  it  is  25,000.  The 
following  results  are  obtained. 


Treatment 


Combination 

Response  ($  X 1000) 

Estimate 

Effect 

def 

A626 

afg 

A693 

96 

FG  + AH  + BD  + CE 

beg 

4718 

258 

EG  + AD  + BH  + CF 

abd 

4655 

288 

AB  + CG  + DH  + EF 

edg 

4662 

-170 

DC  + AE  + CH  + BF 

ace 

4653 

-24 

AC  + BG  + LH  + DF 

bdf 

4685 

-58 

BC  ♦ AG  + FH  + DE 

abedefg 

4626 

-8 

CD  + AF  + GH  + BE 

abegh 

4742 

278 

-H 

bedeh 

4631 

224 

-A 

aedfh 

4655 

158 

-B 

cefgh 

4822 

648 

-D 

abefh 

4682 

-38 

-C 

bdfgh 

4639 

120 

-E 

adeg 

4639 

58 

-F 

h 

4786 

-168 

-C 

Once  again,  only  the  effect  of  D appears  large.  This  is  confirmed  by  the  normal 
probability  plot  given  In  Figure  6. 

2-2. A Remarks  on  Computations  and  Aliasing 

To  this  point  we  have  used  the  relatively  simple  computational  methods 
associated  with  the  2 v designs,  assuming  that  at  least  a regular  fraction  is 
available.  Sometimes  an  experimenter  will  want  to  update  the  estimates  of  the 
effects  following  each  additional  run.  This  might  often  occur  when  augmenting 
a 2 p design  with  additional  runs  to  estimate  certain  interactions.  If  we 


liiiKIRi 


consider  Che  model 


y - XB  + C 


we  can  give  an  updating  equation  for  8 in  term3  of  each  new  run,  assuming  that 
the  starting  point  was  a block  of  runs  giving  orthogonal  minimum  variance 
estimates  of  8 (such  as  the  2‘‘~p  designs).  Tikis  updating  equation  is 


^NEV 


*8  + 
—OLD 


<MN  + p) 


-1 


n 

l 

i-1 


(yi“ 


yi>ii 


(2-3) 


where  p is  the  number  of  model  parameters,  N is  the  block  size,  m is  the  number 

of  blocks  completed,  y ^ is  the  new  observation  associated  with  the  new  vector 

of  variable  settings  x,(i*l,2, . . . ,n  < N),  and  y.  ■ 6 x . . Equation  (2-3)  was 

- 1 -OLD"1 

derived  by  Hunter  [1964]. 

We  may  also  give  a general  result  concerning  aliasing.  If  the  true 
model  is 


Z " *l§i  "*■  X2®2  + 

but  the  experimenter  has  estimated  only  the  parame 

X “ Ml  + £ 

then  it  is  well  known  that  8^  is  biased,  such  that 

E(8i>  - + A82 


ters  8^  using  the  model 
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where  the  matrix  A ■ (XjXj)“*(xjx2>  is  called  the  alias  matrix.  This  general 
result  can  be  used  to  work  out  the  aliases  for  effects  in  the  2 v system. 

It  is  often  useful  in  more  complex  design  settings  than  the  2*t”5>,  particularly 
in  Irregular  fractions,,  such  as  discussed  in  the  next  section. 

2-3.  Irregular  Fractions  of  the  2 Design 

There  are  some  multifactor  screening  situations  in  which  higher 
saturation  of  the  design  than  can  be  accomplished  with  regular  fractions  would 
be  justified.  This  would  be  the  case,  for  example,  when  computer  runs  are  very 
time-consuming  or  expensive.  In  these  situations,  certain  Irregular  fractional 
factorial  designs  may  be  useful.  Often  in  these  designs,  the  experimenter  will 
only  be  able  to  estimate  certain  parameters  in  the  model  and  will  have  few 
remaining  degrees  of  freedom.  Furthermore,  the  estimates  of  the  effects  will 
generally  be  noncrthogonal. 

The  simplest  Irregular  fractions  result  from  augmentation  of  \t  balanced 
2 fraction.  One  may  view  the  process  of  combining  fractions  from  the  saw 
family  In  the  2^  series  as  augmentation  designs,  where  the  augmented  set  is 
as  large  as  the  initial  set.  The  methods  presented  here  are  based  on  smaller 
augmented  sets,  usually  1,  2,  4,  or  8 runs,  added  with  the  objective  of  esti- 
mating two-factor  interactions. 

As  an  elementary  example,  consider  the  2^*  design.  If  only  the  A 
effect  is  large,  then  an  estimate  of  the  A effect  clear  of  the  BC  interaction 
can  be  obtained  with  only  one  additional  run.  Thus  if  I » -ABC,  and  the  runs 
made  are  (1),  ab,  ac,  and  be.  Row  consider  observation  a.  Since  E(a)  • 
p + A — B— C-AB-ACfBC,  we  have,  ifB-C-AB-AC-O, 

E(a)  - U + A ♦ BC 
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If  we  have  an  estimate  p from  the  original  fraction,  then  A + BC  is  estimated 
by  i*  ■ a - ii.  We  can  estimate  A - BC  directly  from  the  first  fraction  as 
t ■ -(1)  + ab  + ac  - be.  Then  l*  + l estimates  A and  i*  - Z estimates  BC. 

Similar  augmentation  schemes  can  be  derived  for  most  other  designs  in 
the  series,  either  to  separate  a single  two-factor  interaction,  a pair  of 

two-factor  interactions,  or  four  such  interactions.  Daniel  [1972]  is  the  basic 
reference  in  this  area.  Addelman  [1969]  diacusses  the  same  problem,  in  more 
detail  than  Daniel  [1962],  but  with  less  adaptation  of  results  to  special  cases. 

Three-quarter  replicates  of  the  2^~P  series  are  often  highly  useful. 
These  designs  may  be  viewed  as  constructed  by  either  omitting  a quarter-fraction 

L 

from  the  full  2 or  by  adding  a quarter-fraction  to  a one-half  fraction.  A 
good  survey  of  these  designs  is  in  John  [1971].  We  will  illustrate  one  of  these 
designs  with  an  example. 


EXAMPLE  3.  Suppose  that  in  Example  1,  only  items  1 and  2 are  of  interest.  We 
would  like  to  obtain  estimates  of  all  4 wain  effects  (the  order  quantities  and 
reorder  points)  and  the  6 two-factor  interactions.  Obviously  a 2^~*  will  not 
do,  since  it  contains  only  8 runs  and  we  must  estimate  10  parameters.  The  full 
2*  design,  requiring  16  rows,  is  considered  too  expensive.  Only  12  row3  can  be 
taken. 

We  can  estimate  all  10  effects  with  12  observations  by  using  a 3/4 
fraction  of  the  2*.  Consider  the  quarter  replicates  (2  I ■ +AB  » +ACD) : 


(1)  I - +AB  - +ACD  • +BCD;  d,  ab,  c,  abed 

(2)  I - +AB  - -ACD  - -BCD;  (1),  abd,  cd,  abc 

(3)  I • -AB  ■ +ACD  • -BCD;  bd , a,  be,  acd 

(4)  I - -AB  - -ACD  - +BCD;  b,  ad.  bcu,  ac. 
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Omit  the  first  fraction  and  run  only  the  last  three.  Now  overlap  these  three 
quarter  replicates  as  follows  to  estimate  the  effects: 

Fraction  1:  (2)  + ( i)  J • -BCD 

A - ABCD  - -110 
AB  - ACT)  - 0 
AD  - ABC  - 0 
ABD  - AC  - -110 

Fraction  2:  (2)  ♦ (4)  I - -ACT) 

B - ABCD  • -32 
AB  - BCD  - 0 

BD  - ABC  - 32 
ABD  - BC  - 0 

Fraction  3:  (3)  ♦ (4)  J - -AB 

C - ABC  - -318 
D - ABD  • -88 
CD  - ABCD-  0 

The  estimates  of  the  4-main  effects  and  6 two-factor  interactions  are  shown 
above,  assuming  that  higher-order  Interactions  are  negligible.  Once  again, 
note  that  only  the  reorder  point  for  item  1 seems  to  produce  a significant  result. 

Addelman  and  Kempthorne  [1961]  have  developed  a series  of  orthogonal  main 
effect  plans.  These  designs  are  useful  in  cases  where  only  main  effects  are  of 
interest.  In  many  cases  factors  with  either  2 or  3 levels  can  be  considered. 

Much  other  work  has  been  done  on  Irregular  fractions  of  the  2*3*  series. 
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Margolin  [1968]  [1972]  has  done  much  of  the  work  in  this  area.  Webb  [1965]  [1971] 
has  also  developed  very  compact  mixed  fractional  factorials  from  this  series, 
involving  20  or  fewer  runs.  There  plans  all  have  very  heavy  2 factor  interaction 
aliasing.  Of  related  interest  is  Webb  [1968]. 

2-4.  Supersaturated  Plans 

These  are  two-level  designs  devised  by  Booth  end  Cox  [1962].  In  these 
designs,  each  of  k factors  appears  at  the  high  and  low  levels  N/2  times,  where 
N ^ k.  We  assume  that  N is  even.  Clearly  not  all  estimates  of  the  effects  can 
be  orthogonal,  since  N <_  k.  Booth  and  Cox  [ 1962’  generated  these  designs  to 
obtain  "near-orthogonality"  by  using  the  design  criterion 

min(max  |d!d,|) 

Hi  " 

where  dj  is  a row  vector  denoting  the  levels  of  factor  i.  The  vector  d^  will  consist  of 
N/2  ♦ l's  and  N/2  - l's.  Booth  and  Cox  [1962]  tabulate  designs  for  N ■ 12  and 
k < 16,  20,  24;  N - 18  and  k ^ 24,  30,  36;  and  N ■ 24  and  k <_  30.  They  describe 
an  algorithm  for  generating  other  designs,  although  the  procedure  may  be  very 
inef  f ictent. 

KXAMP1.E  4.  To  illustrate  the  use  of  a supersaturated  design,  consider  the 
inventory  problem  in  Example  1.  We  now  add  a fourth  item  to  the  inventory,  with 
the  following  parameters: 

D - 350, OCX),  04  - 2,000,  A - $25,  C - $4.30,  h - $0.45,  if  - $0.50 

The  following  13  factors  will  be  considered  in  a screening  experiment: 
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Factor 

HiKh  Level 

Low  Level 

Qi 

10,000 

20,000 

4,000 

8,000 

*3 

3,000 

6,500 

<>4 

5,000 

9,000 

rl 

17,000 

35,000 

r2 

5,000 

11,000 

r3 

3,500 

7,000 

r4 

7,000 

15,000 

*1 

$0.30 

$0.50 

“l 

20,000 

25,000 

6,000 

8,000 

W3 

2,500 

5,500 

w4 

4,000 

10,000 

The  13  factor  Booth  and  Cox  design  to  investigate  these  factors,  and  the 
responses  obtained,  are  shown  below: 
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Response 


*"  Q>  *2  «,  g4  r,  r2  r,  r4  ”|  *1  "2  “l  “4 

w»». 

A B C 0 F F C.  H I J K I.  H 


1 

♦ 

♦ 

♦ 

■f 

♦ 

♦ 

+ 

♦ 

♦ 

♦ 

♦ 

- 

- 

$6138 

2 

- 

♦ 

♦ 

♦ 

- 

- 

- 

♦ 

- 

- 

- 

- 

6166 

1 

- 

♦ 

♦ 

♦ 

- 

- 

- 

♦ 

- 

- 

+ 

♦ 

b247 

4 

♦ 

♦ 

•f 

- 

- 

- 

•f 

- 

- 

♦ 

- 

- 

♦ 

6310 

5 

♦ 

■f 

- 

- 

- 

♦ 

- 

- 

♦ 

- 

+ 

•f 

•f 

6328 

6 

♦ 

- 

- 

- 

+ 

- 

- 

♦ 

- 

♦ 

♦ 

♦ 

6275 

7 

- 

- 

- 

♦ 

- 

- 

♦ 

- 

♦ 

+ 

♦ 

♦ 

- 

6419 

8 

- 

- 

♦ 

- 

- 

•f 

- 

♦ 

♦ 

♦ 

- 

- 

6358 

9 

- 

♦ 

- 

- 

♦ 

- 

•e 

♦ 

- 

- 

♦ 

■f 

6150 

10 

♦ 

- 

- 

•f 

- 

•f 

■f 

♦ 

- 

- 

- 

- 

6158 

11 

- 

- 

♦ 

- 

♦ 

♦ 

♦ 

- 

- 

- 

♦ 

- 

- 

6137 

12 

_ 

♦ 

• 

♦ 

♦ 

as 

__ 

_ 

6135 

The  contrast*  for  each  factor  are  obtained  in  the  usual  way.  These  contrasts 

i 

are: 

A - -71 

B - -205 
C - -109 
D - -295 
E - -819 
F - -197 

Clearly  the  largest  factor  effect  is  E (or  r^),  followed  closely  by  L (or  p^). 
There  are  also  several  other  Moderately  large  contrasts  that  nay  indicate 
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H - -169 
1 • 297 

J - 449 
t - 267 

L - 733 

N - 115 


significant  factors.  This  example  Illustrates  one  of  the  major  disadvantages  of 

a supersaturated  design.  Following  the  Initial  experiment,  if  several  effects 

«*•«■«  to  he  potentially  active,  there  is  no  simple  additional  set  of  experiments 

that  can  be  run  to  Isolate  the  factors  of  interest.  This  is  in  contrast  to  the 
k—D 

2 p series,  where  additional  fractions  from  the  same  family  can  always  be  used 
to  gain  further  information  on  potentially  active  factors,  or  to  untangle  the 
interactions.  Moreover,  the  aliasing  that  is  present  in  the  contrasts  from  a 
supersaturated  design  is  very  heavy  and  irregular,  and  this  will  frequently 
cause  a confusing  picture  to  the  analyst.  In  this  light,  the  supersaturated 
designs  are  likely  to  be  little  better  than  the  "random  balance"  designs  pro- 
posed by  Satterthwalte  [1959]  and  Budne  [1959]. 

2-5.  Croup  Screening  Designs 
2-5.1.  General  Approach 

These  designs  are  intended  for  use  in  situations  where  the  following 
conditions  apply: 

1.  The  number  of  factors  k is  relatively  large 

2.  All  factors  have  the  same  prior  probability  of  being  active 

3.  There  are  no  interactions  between  active  factors 

4.  The  direction  of  all  effects  is  known 

5.  The  errors  associated  with  the  observations  are  NtD(0,o  ). 

A group  screening  design  is  conducted  by  forming  the  original  k factors  into  g 
groups.  Then  each  group  is  considered  as  a single  factor  and  Investigated 
through  a design  such  as  the  2 . If  a group-factor  is  negligible,  then  all 

factors  within  that  group  are  considered  insignificant.  Group  factors  that 
exhibit  significant  effects  are  then  divided  into  smaller  groups  for  subsequent 
expet  .mentation. 
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These  designs  were  introduced  by  Watson  [1961],  who  proposed  that  only  j 

$ 

two  stages  be  used.  Thus  in  the  second  stage,  we  experiment  with  the  original  j 

factors.  Patel  [1962]  and  Li  [1972]  have  generalized  these  results  to  multiple  } 

j 

stages.  j 

2-5.2  Two-stage  Group  Screening  j 

i 

The  k factors  will  be  divided  into  g groups.  Watson  [1961]  originally 
suggested  that  all  groups  be  of  the  same  size,  although  this  assumption  is 
unnecessary.  Because  the  direction  of  effects  is  known,  we  can  label  the  high 
level  of  each  factor  as  the  level  producing  the  largest  response.  The  upper 
level  of  a group  factor  consists  of  running  each  factor  in  the  group  at  the  high  . 

level.  If  this  arrangement  is  not  followed,  some  factor  effects  may  cancel.  ! 

Watson  [1961]  derives  the  optimum  group  size  to  be 

f*  - [(l-ct^pr172  (2-4) 

where  p is  an  estimate  of  the  fraction  of  active  factors  and  is  the  significance 
level  used  for  the  first-stage  statistical  analysis.  This  formula  attempts  to 
minimize  the  total  number  of  runs  required  in  both  stages.  It  also  implies  that 
groups  will  be  of  equal  size.  If  we  have  no  prior  estimate  of  p,  or  if  the 
direction  of  some  effects  are  not  known,  then  (2-4)  is  Invalid. 

Generally,  we  would  expect  p to  vary  from  factor  to  factor.  That  is, 
we  would  have  considerable  knowledge  about  some  factors,  and  little  knowledge 
about  others.  Note  that  as  p increases,  the  optimum  group  size  decreases. 

Therefore,  it  would  seem  reasonable  to  U3e  groups  of  different  sizes,  depending 
on  our  knowledge  of  p for  each  factor.  Factors  that  we  strongly  suspect  are 
significant  would  be  run  in  very  small  groups  (perhaps  of  size  1).  Furthermore, 
factors  for  which  we  do  not  know  the  direction  of  the  effect  could  be  tested  in 
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groups  of  size  1 Co  prevent  the  cancellation  effect. 

As  a hypothetical  example  of  group  screening,  suppose  we  have  17  factors. 
Suppose  that  the  direction  of  factor  1 is  unknown,  and  that  we  are  almost 
positive  that  factor  2 is  active.  The  possible  directions  of  the  other  is 
factors  are  known.  Therefore,  a logical  arrangement  of  the  groups  would  be: 

Group  Factors  Original  Factors 
A 1 

B 2 

C 3, 4, 5,6, 7 

D 8,9,10,11,12 

E 13,14,15,16,17 

5-2 

These  five  factors  could  be  investigated  in  the  first  stage  using  a 2^^  design 
(8  runs).  This  would  permit  investigation  of  all  main  group  effects,  but  these 
effects  would  be  aliased  with  the  two-factor  Interactions  of  the  group  effects. 

If  we  wanted  to  use  16  runs,  the  2jy*  design  would  allow  estimation  of  all  main 
effects  and  two-factor  interactions  of  the  group  factors. 

If  the  assumption  of  no  active  two-factor  interactions  between  the  original 
factors  holds,  then  the  factors  may  be  formed  into  groups  on  an  arbitrary  basis. 
However,  some  choices  of  grouping  arrangements  will  lead  to  more  easily  inter- 
preted results,  or  to  smaller  sets  of  active  factors  to  be  investigated  at  the 
second  stage.  Sometimes  we  can  use  our  knowledge  of  the  problem  to  form  the 
groups.  For  example,  we  might  place  all  similar  factors  in  the  same  groups. 

Thus  if  we  are  simulating  an  inventory  system,  all  reorder  quantities  could  form 
one  group,  all  reorder  levels  a second  group,  etc.  If  some  two-factor  inter- 
actions may  be  active,  then  we  must  take  more  care  in  forming  the  groups. 

Generally,  a significant  two-factor  interaction  (say  AB)  biases  the  estimates 
of  a third  factor  (say  C)  if  and  only  if  all  three  factors  belong  to  separate 
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group  ! actors.  Tlierefore,  if  we  suspect  that  some*  two-factor  interact  ions 
are  active,  then  all  the  factors  involved  in  those  interactions  .should  he 
placed  in  the  sane  group.  For  a proof  of  this  result,  see  Klei  jnen  | 197r>a,  b!. 

iu  the  second  stage  of  a group  screening  design,  in  addition  to  invest igat in 
the  set  of  potentially  active  factors,  we  oust  also  choose  levels  for  the  negl  i- 
glble  factors  identified  in  the  first  stage.  Recall  that  the  linear  model  can 
be  written  as 

z - Mi  * x2g2  ♦ £. 

where  now  contains  the  set  of  potentially  active  factors  and  P7  contains  the 
set  of  factors  tentatively  identified  as  negligible  at  the  first  stage.  The 
matrix  Xj  consists  of  the  levels  assigned  to  the  active  factors  In  the  second 
stage  and  %2  consists  of  the  factors  assigned  to  the  negligible  factors.  Now, 

- _ I • 

the  expected  value  of  the  least  squares  estimate  of  ■ (X  IV  xli:.  is 

E<fl)  " §1  ♦ <x’x1>~1x’x2p2- 

Clearly,  If  all  the  factors  thought  to  be  insignificant  from  stage  1 really  arc 

A 

insignificant,  then  a 0 and  Pj  is  an  unbiased  estimator  of  Pj.  However, 
if  one  or  more  of  these  effects  is  active,  then  P,  j*  0 and  Pj  is  a biased 
estimator  of  Bj. 

A t — 1 * 

The  extent  of  the  blar  In  Bj  is  given  by  the  al las  matrix  A “ (XjXj)  X^X,. 
This  may  be  controlled  by  the  choice  of  factor  levels  for  the  variables  in  x-,. 
Assuming  that  two-level  factors  are  employed,  then  if  all  levels  In  x,  are 
identical  (say  +1,  the  high  level)  then  the  coefficients  in  P,  wtll  bias  only  the 
intercept  or  overall  mean  term  In  J^.  No  other  effects  in  Pj  will  be  biased  bv 
factors  in  P^. 
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To  prove  this,  suppose  that  X^  is  nxp,  is  pxl,  X£  is  nxr, 
and  §2  *s  rxl.  If  the  second-stage  design  is  a 2^  or  an  orthogonal  fraction 
of  the  2^,  then  (xjXj)-^  = (l/n)Ip.  Furthermore,  if  all  of  the  negligible 
factors  In  X^  are  set  at  their  high  levels,  then  X2  is  an  nxr  matrix  of  l's. 
Now  Xx  is  an  nxp  matrix,  the  first  column  of  which  consists  of  l's  (to  account 
for  the  overall  mean  p)  and  the  remaining  p-1  columns  consist  of  the  +1  and  -1 

k I 

levels  from  the  orthogonal  2 design.  Therefore,  X^^  is  a Pxr  matrix,  the 
first  row  of  which  consists  of  n's,  all  the  remaining  elemtns  are  all  zero. 

Therefore, 


(X^X2)“1X^X2  - (l/n)I 


and  the  alias  structure  is 


EC6o>-e„+l  tt 

i=p 


E(Bi)  " 0i’  i“1’2 P_1 


Thus  "he  r elements  in  8^  bias  only  the  estimate  of  the  intercept  0q.  Strictly 
speaking,  all  of  the  r negligible  factors  do  not  all  have  to  be  here  at  the  high 
level.  However,  each  factor  must  ba  held  at  the  same  level  throughout  the 

experiment . 
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EXAMPLE  5.  Consider  the  inventory  problea  in  Example  4.  Suppose  that  there  are 
13  factors  of  Interest,  Qj_,  r1#  Wj,  i^,  Q2,  r2,  y2,  Q3,  y3,  Q4,  r4,  and  y4.  We 
will  arrange  these  factors  In  4 groups,  according  to  itea,  as  follows: 


Croup  Factor 
A 
B 
C 
D 


Original  Factor 
Qi»  ult 

Q2.  r2'  w2 

Q3»  *3.  W3 

Q4.  V V14 


A 2 design  is  used  to  analyze  these  four  group  factors.  The  results  are 
summarized  below: 

Treatment 


r 

Combination 

(1) 

Response 

6207 

Effect 

Estimate 

i 

«• 

ad 

6164 

A + BCD 

-180 

bd 

6183 

B + ACD 

-116 

- 

ab 

6134 

AB  + CD 

-10 

i 

i 

i 

cd 

6210 

C + ABD 

6 

i 

ac 

6168 

AC  + BD 

4 

i 

j 

be 

6181 

BC  + AD 

-8 

abed 

6135 

D + ABC 

2 

- 

Note  that 

the  two  largest  effects  are 

A and  B (and  other  aliases). 

- ■ 

factors  C 

and  D,  and  consequently  the 

factors  for 

item  3 and  4 are 

Therefore,  following  the  initial  8 rows,  we  have  reduced  the  set  of  potentially 
active  factors  from  13  to  7.  The  7 remaining  factors,  Q,  r3,  Vj,  ffj,  Q2,  r2, 

•I  A 1 ^ 

and  $2  could  be  investigated  using  a 2J2  or  2'”  plan,  such  as  illustrated 
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earlier. 


2-5.1  Croup  Screening  With  More  Than  Two  Stages 

Patel  [1962]  and  Li  [1962]  have  generalized  Watson's  results  to  sore 
than  two  stages.  Their  procedures  are  very  similar.  Patel  showed  that  the  total 
number  of  runs  is  minimized  if  we  choose  the  number  of  groups  according  to 

gl  = kPn/(n+1> 


&n+l  = P 


-l/(n+l) 


where  g^  is  the  number  of  groups  into  which  each  of  the  groups  at  stage  i-1  is 
split.  He  also  notes  that  an  n-stage  procedure  is  preferable  to  an  n-1  stage 
procedure  if 


p < [i-a/n)]n(n_1). 

Group  sizes  decrease  geometrically  with  parameter  pl(n+D . Note  that  if  we 
suspect  that  if  more  than  one-fourth  of  the  factors  are  active  (p  > .25).  then  the 
optimua  number  of  stages  is  one.  If  between  one-twelfth  and  one-fourth  of  the 
factors  are  active,  then  two  stages  should  be  used.  Similarly,  a three-stage 
procedure  would  be  used  if  between  one-thirtieth  and  one-twelfth  of  the  factors 
are  active.  Clearly,  these  designs  will  be  useful  only  in  situations  where  p 
(the  ratio  of  active  to  total  factors)  is  thought  to  be  very  small. 

2-6.  Variance  Reduction  Considerations  in  Factor  Screening 

An  important  consideration  in  the  design  of  a computer  simulation  experi- 
ment is  the  incorporation  of  variance  reduction  methods  into  the  design.  Two 
common  variance  reduction  methods  are  the  use  of  common  pseudorandom  numbers  and 
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ant lthet ic  pseudo rand on  numbers  for  different  points  in  the  design.  These 
methods  have  application  to  factor  screening.  Early  work  on  this  problem  was 
by  Fishman  [1974].  Recently, a comprehensive  treatment  of  the  subject  was 
published  by  Schruben  and  Margolin  [1978]. 

We  assume  that  when  common  random  number  streams  are  used  at  two  design 
points,  the  two  output  statistics  exhibit  positive  correlation,  and  when  anti- 
thetic random  number  streams  are  used  at  any  two  points,  negative  correlation 
between  outputs  is  induced.  These  assumptions  are,  of  course,  not  always  met 
in  practice,  but  they  are  satisfied  relatively  often,  as  has  been  confirmed  by 
numerous  investigations  (see  Kleijnen  [1975a],  pp.  197-198). 

Two  possible  estimation  methods  can  be  used,  ordinary  least  squares  (OLS), 
or  weighted  least  squares  (WLS) . These  estimators  are 

Iols  - <x’x>"lx’y 


and 

- (x'v^xrVv-^ 


where  V is  the  correlation  matrix  induced  on  the  responses.  The  covariance 
matrices  for  these  estimators  are 


(X'X)'1X'V_1X(X’X)“1 


and 


CovCSyLg)  - (X’V-ixr1 
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A widely  used  criterion  for  comparing  designs  for  estimating  H is  the 
dcterminant  of  the  covariance  matrix  of  the  estimator.  Designs  that  minimize 
this  criterion  are  called  D-optlmal  designs.  The  determinants  of  the  covariance- 
matrices  associated  with  the  OLS  and  VLS  estimators  are 

»OLS  • l*-*l'2l**v-hci 

and 

dWls  - Kx'v^xr1! 

The  WLS  estimator  has  smallest  generalized  variance  among  the  class  of  linear 
unbiased  estimators.  However,  it  is  often  impossible  to  calculate  the  VLS 
estimate  because  the  matrix  V la  unknown. 

There  are  some  situations  In  which  the  OLS  and  WLS  estimators  are  equiva- 
lent, and,  hence,  these  two  estimators  would  produce  the  same  covariance  matrix. 
Schtuben  and  Margolin  (1978]  show  that  the  two  estimators  are  equivalent  for  the 
cases  of  the  random  number  assignment  schemes  that  minimize  D^.  That  is,  an 
induced  correlation  structure  that  would  minimize  is  also  one  for  which 

A A 

the  estimators  B5LS  and  9^  («ad  have  and  0^)  are  identical.  Therefore, 

the  OLS  estimator  can  be  used. 

Schruben  and  Margolin  (1978]  propose  the  following  rule.  If  an  experimental 
design  admits  orthogonal  blocking  into  two  blocks,  then  if  for  all  points  in 
block  1 we  use  the  save  common  set  of  pseudorandom  numbers,  and  for  all  points 
In  block  2 we  use  the  antithetic  set  of  random  numbers,  then  the  OLS  estimator 
of  B will  have  minimum  generalized  variance.  Specifically,  this  assignment  rule 
will  produce  an  estimator  of  8g  that  is  superior  to  that  obtained  by  common 
random  numbers,  and  equivalent  in  terms  of  dispersion  to  common  random  numbers 


61 


! • 


r 


1 


for  estimating  the  r^  ning  parameters  in  E.  In  general,  the  best  results  are 
obtained  If  the  block  sizes  are  the  sane.  Furthermore,  the  positive  and  negative 
correlations  induced  do  not  have  to  be  equal. 

There  are  some  special  results  that  can  be  stated  tor  the  2k  p series  of 
designs.  If  the  induced  positive  and  negative  correlations  are  equal  in  magnitude, 
then  the  assignment  rule  above  produces  a minimum  generalized  variance  for  the 
class  of  2k-p  designs  assuming  that  the  linear  model  contains  a mean  (6q)  plus  a 
subset  of  r < 2k-p  effects.  This  assignment  rule  also  minimizes  the  trace  of 

/v  A A A 

the  covariance  matrix  of  6 iithat  is,  the  sum  of  the  variances  of  Bo*^l*'**’^r 
is  minimized). 

Occasionally,  factor  screening  experiments  will  make  use  of  saturated 
designs.  For  a saturated  design,  any  induced  correlative  structure  between  the 
obscrvat Ion:  results  in  an  improvement  with  respect  to  the  generalized  variance 
criterion  over  (Kit  obtained  from  independently  =«eking  each  design  point. 

Furthermore,  the  OLS  and  WLS  estimators  are  equivalent  in  this  case  also. 

1- 

These  results  have  direct  application  to  factor  screening.  Any  2 or 
2k~p  design  that  is  not  saturated  can  be  run  in  two  orthogonal  blocks  by  Identi- 
fying the  blocks  with  the  ♦ and  - levels  of  one  of  the  k columns  in  the  design. 
Thus,  only  k-1  factors  could  be  investigated. 

We  now  give  some  illustrations.  First  consider  the  2 design  shown  in 
Table  3.  We  can  run  this  design  in  two  blocks,  say 


TtM-Hf  blocks  were  formed  hv  confounding  tlu*  A8K  effect  (and  it*  aliases;  see 
Table  A)  with  block*.  The  treat**ent  o*mb tnat  Ions  in  block  I would  bo  run  wltli 
one  not  of  common  random  number*  and  those  in  block  2 would  bo  run  with  tin* 
antithetic  nn  ol  random  numbers. 

Am  a second  example,  consider  the  design  run  in  Fxamfle  1.  Since  7 

factors  are  considered  In  only  8 runs,  this  is  a saturated  fractional  factorial. 

If  only  this  fraction  is  to  be  run,  any  induction  of  correlation  is  superior  to 
independent  observations,  so  running  all  8 observat Ions  with  . ommon  random  number 
stream*  would  be  an  appropriate  strategy.  Mow,  if  any  fraction  from  the  same 
family  Is  added  to  the  original  fraction,  the  new  traction  should  he  run  using  the 
antithetic  random  nimber  stream,  fleartv,  this  is  an  optimal  strategy,  since  the 
two  fractions  together  can  be  viewed  as  a fold-over  design  with  the  random  number 
stream  effect  taking  the  l«*vrls  of  the  eighth  factor  (*#hlch  is  + In  the  fraction 
I and  - In  fraction  2). 

As  a third  example,  consider  the  2^“^  design  in  Table  7.  This  design 
Investigates  ti  factors  in  8 runs,  and  since  it  is  not  a saturated  fraction,  we 
could  obtain  a minimum  generalized  variance  by  decomposing  the  design  into  two 
orthogonal  blocks  of  4 runs  each.  Now  anv  nonsaturated  Resolution  111  plan  can 
ho  run  in  two  blocks  hv  identifying  the  ♦ and  - levels  of  a single  additional 
variable  with  the  blocks.  Thus,  In  our  example,  add  a seventh  column  to  Table 
7 by  netting  the  signs  In  that  column  equal  to  8 * ABC.  Thus,  the  signs  are 
♦,  M,  ~ . ♦,  - . and  ♦.  Consequently,  run  treatment  combinations  def,  abd, 

ace,  and  hef  in  block  1 (-)  with  a coupon  set  of  random  numbers,  and  treatment 
combinations  ar,  he,  cd,  and  abedef  in  block  2 with  the  antithetic  set  of  random 

l 

numbers . \ 

Noa  suppose  upon  examining  the  estimates  of  the  effects  from  this  fraction, 
it  is  decided  to  add  a second  fraction  from  the  same  family  to  separate  main 
effects  and  two-factor  Interactions.  The  appropriate  second  fraction  is 


hi 


A 

B 

C 

D--AB 

E— AC 

F--BC 

♦ 

♦ 

♦ 

- 

- 

- 

abc 

- 

♦ 

■F 

♦ 

♦ 

- 

bcde 

♦ 

- 

♦ 

♦ 

- 

+ 

acdf 

- 

- 

+ 

- 

♦ 

+ 

cef 

♦ 

♦ 

- 

- 

+ 

+ 

abef 

- 

+ 

- 

■F 

- 

+ 

bdf 

♦ 

- 

- 

+ 

■F 

- 

ade 

_ 

_ 

(1) 

In  this  new  fraction,  block  1 would  conalst  of  bcde,  acdf,  abef,  and  (1). 

These  rows  would  be  wade  with  the  same  set  of  random  numbers  used  In  block  1 

from  the  first  fraction.  Block  2 in  the  new  fraction  would  consist  of  abc. 

cef,  bdf.  and  ade.  These  runs  would  be  made  with  the  antithetic  stream  of  random 

numbers  used  In  block  2 in  the  original  fraction.  It  is  easy  to  verify  that  the 
6-2 

final  design  is  a 2^^  plan,  with  generation  I««LCDE“ACDF*ABEF.  The  estimators 
from  the  combined  design  have  minimum  generalized  variance. 

2-7.  Evaluation  and  Choice  of  Screening  Designs 

In  this  section,  we  will  evaluate  the  characteristics  of  the  various  types 
of  screening  designs.  Hopefully,  this  will  provide  guidance  on  the  selection  of 
designs  in  practice. 

The  2^~P  fractional  factorial  design  has  many  advantages  in  .actor  screen- 
ing. If  we  can  afford  N runs,  where  N is  a power  of  2,  Resolution  III  plans  can 
be  derived  that  Incorporate  up  to  N-l  factors.  These  plans  require  the  experi- 
menter to  assume  that  two-factor  and  higher  interactions  are  negligible.  How- 
ever, the  assumptions  regarding  interactions  can,  to  some  extern,  be  checked  by 

L - 

combining  the  original  2^^  design  with  a second  fraction  from  the  same  family. 
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If  the  experimenter  can  afford  up  to  N*2k  runs,  the  2**  p Resolution  III  and  IV 
plans  are  highly  recommended.  The  Plackett-Burraan  plans,  also  of  Resolution  III, 
are  not  generally  recommended  for  factor  screening  unless  the  analyst  knows  in 
advance  that  all  but  a few  two-factor  interactions  are  negligible.  The  heavy 
aliasing  of  main  effects  and  two-factor  Interactions  is  an  undesirable  property 
of  these  designs. 

The  supersaturated  plans  of  Booth  and  Cox,  like  the  Plackett-Burman 
designs,  assume  that  only  main  effects  are  active.  If  this  assumption  is  false, 
then  the  alias  structure  generated  by  a supersaturated  design  would  be  extremely 
difficult  to  untangle.  The  group  screening  methods  of  Watson  and  Patel  are 
recosnended  instead.  This  approach  would  seem  to  have  the  economic  efficiency 
required  in  simulation,  without  the  overly-reatrictlve  assumptions  regarding 
interactions.  For  the  vast  majority  of  screening  problems,  either  two  or  three 
stages  will  be  sufficient.  Once  groups  of  factors  are  formed,  it  is  recommended 
that  2*t_p  fractional  factorials  be  used  to  investigate  the  group  factors. 

3.  SCREENING  WITH  UNDESIGNED  AND  PARTIALLY -DESIGNED  DATA 

3-1.  Factor  Screening  with  Regression  Models 

Very  few  factor  screening  studies  will  begin  in  an  inter.  : tlonless  state. 
In  mose  cases,  we  find  that  the  analyst  has  some  computational  experience  with 
the  simulation  model.  It  would  be  economically  efficient  to  incorporate  as  much 
as  possible  of  this  historical  Information  into  the  screening  study. 

In  Section  2,  we  illustrated  how  the  general  linear  model 

jr  ■ xB  + e 

could  be  used  in  factor  screening.  If  an  experiment  can  be  designed  for  studying 
the  effect  of  the  factors,  very  efficient  parameter  estimation  techniques  can  be 
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used  ami  data  interpretation  in  relatively  simple.  Om-  reason  tliat  the  dcsigiu-d- 
wperimut  case  is  so  simple  la  tliat  mmt  ucreentng  designs  arc  oft)mi;niMl ; tiiat 
Is,  the  regression  coefficients  P have  unrondit lonal  interpretations.  If  wv 
apply  the  same  approach  to  undesigned  data  that  may  have  been  collected  for 
other  purposes  (such  as  validation  or  verification),  this  ease  of  lnterp’ctat Ion 
is  lost.  However,  it  is  still  possible  to  learn  something  about  the  relative 
Importance  of  the  factors. 

When  dealing  with  undesigned  or  historical  data,  our  approach  is  to  fit 
an  appropriate  regression  model  to  the  data,  and  then  make  inferences  on  the 
model  parameters  to  determine  the  effects  of  the  factors.  This  is  often  hazardous, 
since  it  is  well-known  that  the  regression  coefficients  0 measure  only  the 
partial  effect  of  a variable.  That  is,  Bj  measures  the  effect  of  x^  conditional 
on  the  other  x^  (ifij)  In  the  regression  model.  Furthermore,  depending  on  the 
degree  of  nonorthogonallty  In  the  data,  the  least  squares  estimates  of  0 may  be 
very  far  from  the  t*\\»  regression  coefficients. 

With  undesigned  data,  the  factor  screening  problem  consists  of  two  stages, 
(l)  variable  selection,  and  (2)  interpretation  of  regression  coefficients.  We 
will  discuss  these  problems  in  the  next  two  sections. 

There  may  also  be  a third  type  of  screening  study,  part-way  between  the 
extremes  of  designed  experiments  and  undesigned  experiments.  This  is  the  situation 
in  which  some  new  data  points  may  be  collected  for  use  with  the  original  undesigned 
data,  but  the  amount  of  new  data  to  be  added  is  not  enough  to  constitute  a fully- 
designed  screening  study.  We  will  discuss  methods  for  augmenting  undesigned 
data  for  factor  screening  studies  of  this  type. 

3-2.  Variable  Selection  Procedures 

There  is  a vast  literature  on  variable  selection  in  regression.  A very 
comprehensive  review  of  this  subject  is  in  Hocking  [1976].  Variable  selection  is 
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b<>th  at)  art  and  a science,  and  should  be  performed  with  care  and  caution.  It 
should  be  regarded  as  exploration  of  the  structure  of  the  data. 

We  nay  classify  variable  selection  methods  into  two  general  types, 
stejiwi »*— type  methods  and  search  sethods.  Stepwise  regression  and  its  major 
variation's  (forward  selection  and  backward  el ininat ion)  are  wel  1 -known . These 
prividurrt  should  not  be  used  mechanically  to  find  the  “best"  regression  tquat Ion 
Hore»v« r,  t Ur  order  in  which  variables  enter  an**  leave  the  mode]  should  not  be 
interpreted  as  measuring  the  relative  Importance  of  the  lectors.  The  existence 
of  asilticol linearity  (correlation  between  factors),  which  is  often  a function  of 
tire  disposition  of  the  data  in  x-space,  impacts  the  variable  selection  problem 
signlf leant ly. 

Search— type  variable  selection  methods  iacludc  the  all-possible  regression 
algorithms,  the  Horking-Lanotte  SELECT  procedure  (see  Hocking  [1976]  for  s 
descript  ion) , and  the  directed  t-seareh  method  (see  Daniel  and  Wood  [1971). 

These  procedures  often  produce  results  superior  to  stepwise  type  methods,  par- 
ticularly for  data  that  is  badly  nonorthogomal . The  all  possible  regressions 
*eth'*»  i l»as  much  to  recommend  It,  particularly  when  the  number  of  factors  is 
small,  say  20  . - less.  There  are  several  good  conputat tonally  efficient 
algorithms  for  -*•  'ible  regressions,  including  the  Furnival  and  Wilson 

( 1974]  algorithm,  which  * 'ow  available  on  WO-P . 

For  factor  screening  pur*>  stepwise  type  methods  can  be  used  at  the 

outset  of  the  problem,  to  reduce  the  number  of  factors  to  about  20.  Generally, 
backward  elimination  seems  to  work  well  st  this  stage,  although  any  of  the 
stepwise-type  prorecures  can  produce  jc»d  results  if  carefully  used.  Then  one 
of  the  search  methods  such  ss  all  possible  regressions,  should  be  employed  using 
the  subset  of  the  original  factors  identified  at  the  first  stage.  The  end  result 
may  he  several  final  equations.  Each  good  candidate  equation  should  be  examined 
for  adequacy  and  validity  using  the  standard  techniques  of  residual  analysis 


(see  Draper  and  Smith  [1966],  Ch.  3).  Since  the  primary  objective  of  building 
the  regression  model  is  to  obtain  good  estimates  of  the  parameters,  the  model 
selection  criterion  should  be  chosen  accordingly.  Selecting  the  model  that  gives 
.1  minimum  mean  square  error  will  generally  lead  to  good  estimates  of  the  i:  tividual 
regression  coefficients.  Selection  of  variables  based  on  R-  (a  popular  pr*.  lice) 
often  causes  important  variables  to  be  left  out  of  the  equation. 

3-1.  Interpretation  of  Regression  Coefficient 

As  noted  previously,  interpretation  of  regression  coefficients  is  hazardous, 
since  .•  j measures  the  effect  of  x j given  that  other  factors  x^  (it*j)  are  also  in 
tin*  model.  Fur t h« more,  the  magnitudes  of  the  individual  coefficients  are 
.if  I ret  e«l  by  the  units  ot  the  factors  and  the  response  y.  For  this  reason  it  i. 
ostt  i 1 v best  to  work  with  standardized  coefficients  (of  ten  identified  as  "beta 
i •<«■!  f i c i on  t s"  i'll  regression  computer  program  outputs).  In  general  the  standardized 
< o<  ! i i . i < n t s ,n  ••  ! on  nd  hv  solving 


0-1) 


%0:>  j.  i is  i !,<  < or  re  1 at  ion  matrix  o:  the  k factors  and  g is  a vector  ot  sjsrple 

i “l  r e i at  tens  between  x,  and  t resp-nse  v . The  relat  iotis.hip  between  the 
•t  .us!  I*  .1  l .-.si  >!».!  original  foUi  sM.ia  coefficients  is 


t }--  ) 


where  s'  , Is  the  corrected  nB  ol  wquaf*  s ot  V j:vd  S(1  is  the  corrected  s'xa  ot 
squares  ot  t-  e X . . 

-hit.  the  magnitude  and  .|,m  of  t he  standard  i .-«•»!  regression  coetticients 
are  often  used  as  measure*  of  importance  ol  t h.e  factors,  we  must  r<"m«-rr.ber  treat 


tin’  part  ini  nature  ot  those  coefficients  still  hampers  interpretation.  Only  it 
llu  factors  Xj,  j = 1 , 2 , . . . , k are  orthogonal  (or  nearly  so)  is  the  total  effect  of 
x.  re  I letted  hv  | ■*.  Thcretoro,  /«■  should  examine  the  I in.tl  sot  of  factors 

I 1 

\ , j = l,2 f . and  measure  the  extent  of  departure  from  or  t hogou.l  1 i t V Indore 

i 

i nt  erpret  i lip.  the  individual  s t an>la  r*  I i god  regression  cue!  ! icients.  One  useful 
measure  of  «.rt  hoj-.uia  1 i t v is  Jcj.  li  |c!  !,  the  1 actors  are  orthogonal,  while 

il  j € : j - (),  (here  is  at  least  one  linear  dependency  in  the  factors.  Therefor*., 
if  ! C j is  I irge,  sty  close  t>>  I.  we  feel  relatively  confident  in  interpreting 
the  individual  regression  cacti  icients.  On  the  other  hand,  if  IC1  is  small, 
sav  j (' j 0.1,  t !un  we  suspect  that  severe  nul  t ico  1 1 inear  i t y is  present,  t*n*i, 

« on sequent  I v , the  regression  out:  icients  are  very  unstable.  In  such  a case, 
i at  erpret  at  ion  at  t lie  individual  coefficients  would  he  ViTv  riskv. 

lor  intermediate  Values,  x.iv  0.1  • C 1 •'  0.'*,  other  measures  o|  multi- 
c.  * 1 1 i net  r i t v •l.oul.l  he  txaniiM-.l.  Ihcse  include  the  variance  inflation  (actors 
(the  main  diagonal  elements  ot  0 ‘),  and  the  eigenvalues  of  C.  It  the  largest 

variant  e mtl.it  ion  latter  l.  greater  than  10,  or  if  the  ratio  ot  the  largest 
t.»  -nl  lost  eigenvalue  (called  t tie  c t>nd  1 1 ion  i ng  number)  exceeds  10,  then 
t or  red  i vo  .it  t it<ii  should  he  taken  heiore  interpreting  the  individual  coefficients 
: » i . i art, t live  ..ct  i,’ii  wool.:  iviiMst  .it  n-e.t  ir-il  in,;  the  parameters  hv  a nvtlwvi 
■ •itii  ic  ills  dtsiritd  to  c,e-.!>at  mult  icol  linearity. 

A it  lt!t  !'.-ii''.i  'I  iar  i-f!i  r t .t  i - it  iea  rrethi*d  designed  to  cewhat  rsulti- 
. . ■ i 1 i a.  ii ! t v is  ridge  regression.  The  ridge  regression  cst isutcs  are  defined  as 

_*< . ) *>  (f  ♦ klT1*  0-3) 

. tic*  it.  » ho  imi.i  ! -etta*.  ror  c noosing  k is  to  solve  (3-3)  for  various  k, 
plat  If)  vt  fis  » , and  select  k as  t he  value  at  which  reasonable  stahl  i /at  ion 
i •!  T’.e  coeftt,  ients  .*('*)  r«-sults.  . s plot  is  called  ttie  ridge  tt.ici*.  For 


further  details,  see  Hoerl  and  Kennard  (1970]. 

* If  0-2)  in  applied  to  the  full  set  of  factors,  then  the  ridge  trace  is 

used  to  el  Initiate  negligible  factors.  The  rules  for  el  initiation  of  factors  are: 

1.  K1 initiate  factors  whose  standardized  coeff icients  are  stable  but 

stsa  11. 

2.  K1  initiate  factors  witose  coefficients  are  unstable  and  tend  to  zero  as 
k increases. 

1.  Elininate  one  or  nore  factors  with  unstable  coefficients. 

The  remaining  set  of  variables  should  be  exanined  for  near-orthogonality.  This 
nay  be  done  graphically  by  plotting  D * £*(k)‘2*(k)  against  k.  Mote  that  D is 
the  squared  distance  of  & (k)  f run  the  origin.  It  can  be  shown  that  for  an 
orthogonal  systm,  the  distance  of  the  ridge  coefficient*  f ron  the  origin  should 
; be  r#(rt)*i  *(»>)/( l+k)?.  If  the  factors  are  nearly  orthogonal,  the  graph  of  these 

L 

two  functions  should  be  nearly  identical. 

[ 

EXAMfl/  ft.  Consider  the  four- 1 ten  inventory  problm  described  earlier.  Table 
1 12  contains  10  observations  on  average  annual  cost  and  the  corresponding  values 

of  the  independent  variables  Qj,  Q2,  Qj,  Q^,  fj,  r2,  r^»  r^,  *|.  and  Up  These 
JO  runs  do  not  correspond  to  any  standard  factor  screening  design.  We  will 
illustrate  how  regression  net hods  can  be  used  to  identify  the  most  influential 
factors. 

These  10  variables  were  analyzed  using  the  BMD-P  stepwise  Multiple  regression 
progran  P2*.  The  F- level  for  entering  and  removing  variables  was  arbitrarily 

f 

< set  at  4.0  (the  logic  for  this  choice  steam  f ron  the  fact  that  t^  * F,  and  t “ 2.0 

corresponds  roughly  to  95  percent  significance).  The  results  of  this  analysis 
are  st—ar i*ed  below: 


I 
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Table  12 

Data  for  the  Inventory  Problem,  Example  6 


I 

I 

I 


Ml  ion 

Cost  Qj 

*2 

93 

1 

4449  10490 

4230 

3310 

2 

4223  22370 

3750 

5350 

3 

4181  9940 

7790 

3490 

4 

4142  9400 

8940 

4230 

3 

4194  8940 

8340 

4480 

4 

4188  10340 

4330 

4120 

7 

4140  17940 

5180 

3600 

8 

4145  9220 

5220 

3380 

9 

4174  14400 

4840 

3870 

10 

4438  13010 

5410 

3990 

11 

4198  9300 

3240 

6140 

12 

4214  23940 

8130 

4440 

n 

4174  18380 

4900 

6340 

14 

4149  HOOO 

4630 

4080 

IS 

4492  8120 

4690 

4380 

14 

4132  12350 

3990 

3000 

17 

4218  21490 

7280 

4440 

18 

4138  13780 

3680 

3380 

19 

4383  20200 

6940 

5820 

20 

4283  14440 

4090 

5920 

21 

4287  19310 

6110 

5190 

22 

4189  18040 

5400 

4940 

23 

4714  9390 

8340 

6260 

24 

4138  15110 

4140 

3780 

25 

4193  21780 

4270 

3000  : 

26 

4141  18230 

7000 

6930 

27 

4142  17450 

3390 

6170  1 

28 

4276  13900 

3320 

3940 

29 

4138  13470 

8340 

3840 

30 

4133  19120 

6100 

3980 

8970 

7320 

10490 

4990 

7390 

10390 

4340 

11400 

4880 

4270 

9140 

9740 

7250 

3440 

9420 

3230 

9800 

9140 

3130 

4210 

3380 

3790 

4930 

8880 


4990 

L1300 

8220 

3790 

7110 


22380 

24130 

37370 

28920 

32400 

29300 

29700 

23900 

17940 

17430 

18380 

24410 

20390 

33040 

27000 

32340 

21380 

34840 

13010 

23390 

22330 

14440 

19230 

23490 

24110 

27080 

28910 

30470 

39980 

34150 


8810 

8240 

4100 

11490 

5130 

10280 

7070 

9920 

4200 

4370 

4990 

9770 

7230 

10040 

4700 

10330 

5100 

9880 

11130 

4400 

9870 

4540 

4870 

11200 

5530 

10990 

4120 

7080 

9190 

8030 
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r3 

r4 

ni 

1 

3940  14200 

.70 

32010 

6800  21830 

.27 

39120 

6540  18740 

.15 

23490 

7510 

7060 

.39 

27190 

4270 

5680 

.60 

25830 

3860 

8430 

.18 

35640 

3460  17620 

.71 

11400 

3080 

9210 

.57 

22900 

3310  13600 

.38 

17030 

4230 

4700 

. 56 

25770 

4320  15400 

.18 

24010 

4320 

6120 

.36 

34070 

4220  14800 

.67 

21990 

3050 

8200 

.47 

16'S0 

3410 

6140 

.60 

34460 

6040 

9900 

.55 

26170 

6180  20300 

.19 

33330 

6470  21120 

.83 

17500 

3930 

9260 

.77 

28100 

7560 

7130 

.90 

29200 

4530 

4900 

.15 

34590 

6260  21060 

.76 

12340 

3000  19930 

.65 

34900 

7900  17420 

.40 

15660 

5060  15020 

.41 

28270 

3920 

8130 

.84 

10980 

6980 

7660 

.55 

22220 

4400  10410 

.50 

39150 

6130  12240 

.45 

25000 

7620  14240 

.13 

24900 

Variable 

Standardized  Coefficient 

Partial  F Statistic 

<*1 

-0.172 

7.438 

rl 

-0.442 

13.139 

"l 

0.378 

8.940 

l'. 

0.612 

23.866 

This  <'<iu.it  ion  Ims  R“  * 0.6610  and  MS j,  ■ 6916.97.  A plot  of  residuals  fr<«  this 

■N 

nxiil  versus  the  predicted  values  is  shown  in  Figure  7.  This  display 
indicates  a tendency  to  underpredict  cost  near  the  extreme  value  of  the  response 
variable.  This  could  occur  either  because  important  variables  have  been  left 
out  ot  the  model,  or  because  the  relationship  between  cost  and  the  independent 
variables  is  not  linear.  In  this  problem,  considering  that  we  know  that  average 
annual  Inventory  cost  is  not  a linear  function  of  Q and  r,  it  would  seem  that 
the  latter  possibility  should  be  explored. 

Since  none  of  the  variables  associated  with  items  2.  3.  or  6 are  apparently 
significant,  they  are  ignored,  and  the  data  analyzed  with  the  candidate  vari- 
ables tjj,  rj.  «,r , . rj.  IIj,  and  l>j.  This  second  analysis  is  performed  with 
the  »D-f  all  possible  regressions  algorithm  (Furnival  and  Vitson  ll?74l)  P‘*R. 

The  criterion  for  model  select  i ^n  is  minimum  KSg . The  results  are  shown  below: 


1 

Variable 

Standardized  Coefficient 

t-statistic 

V 

-1 . 3 SO 

-2.20 

i 

rl 

-2.995 

-2.57 

6 * 

Qjri 

i.  175 

1.72 

i. 

1.767 

1.92 

r 

ni 

0.401 

3.34 

L 

w 

0.667 

5.46 

i 

■ 

1 

72 

This  e-*-*r?  ^ jr^cldt  “ 0.7226  and  MSg  * 6206.86.  Clearly  Qj  has  a strong 
u effect,  and  r ^ exhibits  both  linear  and  second-order  effects.  The 
variables  Qj  and  r^  are  much  More  influential  than  and  Uj  in  explaining  the 
variation  in  average  annual  cost.  There  is  also  evidence  of  an  interaction 
between  Qj  and  r^. 

A plot  of  the  residuals  froa  this  aodel  versus  the  corresponding  titled 
values  is  shown  in  Figure  8,  and  a noncal  probability  plot  of  residuals  is  shown 
in  Figure  9.  These  displays  do  not  indicate  any  gross  violation  of  assuaptions. 

3-4.  Augaenting  Undesigned  Data 

In  screening  situations  where  soae  additional  runs  can  be  added  to  exiating 
data,  a natural  question  is  the  development  of  criteria  for  locating  these  new 
observations.  If  aultlcollinearlty  is  a significant  problem  in  the  original 
data,  then  it  seeas  logical  to  locate  the  new  points  so  as  a alleviate  this 
problea.  Insofar  as  that  is  possible.  On  the  other  hand,  if  aultlcollinearlty 
is  not  present,  then  other  criteria  could  be  developed. 

A ayaptoa  of  aulticol linearity  is  a aaall  value  of  |c|.  Therefore, 
if  a new  runs  are  to  be  aade,  they  should  be  at  points  in  the  factor  space  chosen 
to  aaxlalze  |c|.  If  there  are  k factors,  and  if  we  think  of  the  region  of  interest 
for  these  factors  as  a k -dimensional  hyper cube,  then  |c|  is  max laired  by  adding  a 
new  runs  at  the  corners  of  the  experimental  region.  For  details  of  this  procedure, 
see  Cay lor  and  Merrill  [1968]  and  Dykstra  [1966).  Their  procedure  allows  the 
coordinates  of  all  a new  points  to  be  determined  simultaneously.  If  sequential 
augmentation  is  desired,  then  adding  each  new  run  at  that  point  la  the  factor 
space  where  the  variance  of  the  predicted  response  is  asxlaised  will  also 
maximize  |c| . 

Maximizing  |c|  is  a variance-oriented  criterion.  It  is  a reasonable 
criterion  if  the  fora  of  the  aodel  fit  to  the  data  is  correct.  However,  in  most 
factor  screening  studies,  we  have  aade  the  assumption  that  some  effects  are 
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Figure  9.  Normal  Probability  Plot  of  Residuals.  Example  6. 
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<»«•>’.  1 1 I Sluaa*  lliftr  is  .-.I  wavs  tin-  |««ks  Ihl  1 .1 1 v lli.il  I Iii'Si'  a.ssiimpt  i mis  ati* 

int  «*r»  I'i't  . t ha*  analyst  itsil.l  ilavt  taa  auf'.nia-nf  his  original  al.ll.l  will)  |M»  lnl  s 

aims. 'll  sat  lli.ii  I ha-  Itl.is  in  ra’/.t  or;s  ion  e*sf  i nt.l  t a*s  i ra)M  t'Xi*  I llalasi  t.la'latrs  is  m i It  Silt  i /a'al . 

Wa*  wilt  Itatw  alusa  I tha*  .1  al.ll.l  ailKiai.llt  at  inn  st  llinaa'  I nr  this  situation. 

Sil|»pa>sn  that  Wa*  ll.ivo  tit  t lu-  mnda‘l 


V “ 'I  t,  ♦ L 


hut  tha*  la‘s|Htnsa.  is  fa'.lllv  ala*t  a'Mi  illasf  hv  Ilia*  fa’  I at  lasts  It  i p 


i'lv)  “ k j lj  | ♦ \ , ijf . 


AsMiaaia-  tll.lt  t III*  ilttla*|>a*llala*nt  V.irl.lhla**  .11  a-  ala'linasl  slla'll  that  till*  aa-nt  aT  a*t  t ha* 
r a*j*  ■ aan  a>t  illla'la'st  K is  at  (0,  0,  ...  ill,  Tlla*  ra*£  last  nt  intl*r*st  is  .1  k~d  i moils  i a ill.  1 1 
unit  M|«lu*ra*  anal  slmulal  tna-tuata*  all  points  in  lha*  undo*  if,  nasi  data.  I'ura*  Mist  hi* 
t .»ka*n  in  sa*  I a*a*  t i iif*  It  sliiaa*  h I as  Is  na*t  invariant  undor  tin*  t ransl  atrm.it  i«*n  and 
tlilloronl  It's lilts  Wa'lllal  ha*  oht.lina*il  for  allllaTallt  tiT.ltHVa  of  illta’ra'St. 

Tin*  Moan  saju.ira*  orta»r  Is  .1  moaisiiri*  a»f  ha»t  h hi. is  anal  var  l.ina'i* . Tin*  ma*.in 
si|uara*  a*rror,  intov.ialisl  aiva*t  flu*  ri*Rla*n  ol  Iniorost , Is 


.1  - KjY(x)  - n(  x)  p dx 


- t ra.*o|  jij  jMj  J I ♦ -j’KlS,  - I. .,2> 


♦ (M-,M  (M“*M  - >)at 

II  I.’  N I * 11  11  12  11  12  -2 


■lv* 


wlu*ra*  Mj  j * N X j X j , 1,  J»l,2,  an*  mat  r loos  of  alosl^n  momonts, 


( I-'.) 
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ar«*  re}' i»n  moment  matrices,  £2^°*  **  *s  t*>e  num*>er  °f  observations,  and 

Is  the  experimental  error  variance. 

The  average  mean  square  error  is  composed  of  two  terms,  the  average  variance 

-1 

V - tracefp j jMjj 1 . (3-6) 

and  the  average  squared  bias 


» ■ a>„  - Vn“i2>  * <h:;«12  - ‘,nU12,Uu<MllM12  * “uVV  o-7) 


Average  squared  bias  is  minimized  when  design  moments  are  equal  to  region  moments, 
or 


'll 


i'll  and 


- V 


12 


(3-8) 


Average  squared  biaN  then  Is  a function  only  of  the  region  moments  which  are  not 
^lujntrol  lable  by  the  experimenter  and  its  minimum  value  is 


Bmin  ’ S>22  - ^nvU * <3‘9 

An  undesigned  experiment  will  not  meet  the  conditions  in  Equation  (3-8), 
but  it  in  possible  to  augment  the  experiment  in  such  a way  that  the  conditions 
will  he  met  or  nearly  met.  We  then  are  operating  on  the  controllable  part  of 
average  squared  bias,  say 


(3-10) 


»c  - a;uHi‘M12  - u;;m12),u<m;'„i2  . ,;;„12)  i»2. 


Consider  first  the  case  of  fitting  a model  containing  all  second  order 
effects  when  some  third  order  effects  are  present.  We  desire  design  moments  to 
equal  region  moments  through  order  5.  Expressing  the  equalities  in  equation 
(3-8)  results  in  a set  of  simultaneous  non-linear  equations  that  can  be  solved 
for  the  additional  experimental  trials  necessary.  For  example,  the  pure  second 
design  moments  should  equal  the  corresponding  region  moments,  or 


l *?„  ■ N/<k  + 2).  i-1 k. 


(3-11) 


where  k is  the  number  of  factors.  Similar  equations  are  written  for  the  other 
moments.  For  the  N observations  already  taken, the  left  hand  side  of  Equation 
(3-11)  is  constant.  We  can  now  select  m additional  runs  so  that 


l xj  -W/(k*2), 
u«l  lu 


where  W - N + m.  The  levels  of  the  variables  for  the  additional  runs  are  X(u» 
u-N+l , . . . ,W. 

The  selection  of  the  m additional  runs  is  accomplished  by  minimizing,  the 


f unct ion 


k/W  \ 2 kk/W 

l l *J  * I ill  x x ) 
1-1  yU—  l 7 i-l  J>i  y U— 1 lU  J7 


k/W 

l ( l 

i-l  \u-l 


xf  - W/(k  + 

ill 


)2  k k k /W  \ 

+ l l l I l xiuxjux*j 

i-i  j>i  i>j  \u-i  lu  Ju  x7 


k k/W  \ 2 k k k/W  \; 

l III  x UX  J * l l ill  x^x  xj 

i-l  J*i  \u-l  lu  Ju/  i-l  j-i  m \u-l  iu  Ju  7 
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k k k k /W 

*1111 
i=l  J*i  l■>^  P>«. 


k k /W 


V \ 

A xiuxjux2.uxpu] 

r1  I 

+ I I (I  xf.  - w/(k  + 2)(k  + d3 

i=l  j>i  \u-l  1U  JU  / 

k / W \ 

+ l I *!  - W/(k  + 2) (k  + 4) | 2 

i = 1 \u«l  1,1  ) 


k k k k k / W 


*11111 

i=l  j>l  £>j  p>Jt  q>p 


(l 

^u-1 


X.  x,  Xg  X X 
iu  ju  Jcu  pu  qu 


■) 


(3-12) 


For  the  case  where  we  fit  a main  effects  model  and  the  true  system  contains 
second-order  effects,  moments  through  order  3 must  be  equal  and  this  is 
accomplished  by  minimization  of  F(x)  in  Equation  (3-12)  considering  only  the 
first  four  terms. 

The  bias  will  be  minimized  if  the  additional  design  points  can  be 
selected  so  that  F(x)  is  zero.  However,  in  many  cases  when  adding  only  a 
limited  number  of  design  points  the  minimum  value  of  F(x)  is  greater  than  zero, 
that  is,  not  all  moments  can  simultaneously  be  adjusted  to  the  required  values. 
In  those  cases  where  the  minimum  possible  value  of  F(x)  > 0 the  augmented  design 
will  minimize  bias  only  when  the  contribution  to  the  controllable  part  of  bias 
resulting  from  any  design  moment  not  equalling  the  corresponding  region  moment 
is  the  same  for  all  moments,  that  is,  all  components  of  are  equal. 

The  value  of  g2  increases  as  we  add  observations  causing  the  minimum 
value  of  bias,  Bmin,  to  increase  while  the  value  of  the  controllable  component 
of  bias,  l$c  is  decreasing.  Bc  can  be  decreased  to  zero,  in  which  case  further 
additional  observations  can  only  increase  average  squared  bias  due  to  the 
increase  in  , Also  the  amount  of  increase  in  Bm£n  may  become  greater  than 
the  decrease  possible  in  Bc.  This  indicates  that  a measure  is  necessary  that 
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will  indicate  when  average  squared  bias  is  at  a minimum 


We  select  as  such  a measure  the  percentage  reduction  in  bias,  say 


pR  . — i x 100Z 

B0 

where  the  subscript  0 indicates  the  original  value  of  the  undesigned  experiment 
and  the  subscript  a indicates  the  value  after  augmentation.  By  letting  the  terms 
inside  the  square  brackets  in  equation  4 be  denoted  by  Q,  we  can  express  PR  as 


PR 


a'  On'  - a ' Q ot 
“2(T0-20  -2a  a-2a 


rt2()V20 


(e2/°)Qo  *^*0  (G2/o)  * ^ <§2/o)Qa  (h/a) 

^ <fi2/o)Q0  *^0  (Vo) 


or 


PR 


N0%  - V, 

NaQo 


(3-13) 


It  can  be  seen  that  PR  does  not  depend  on  the  unknown  value  of  &^/o.  The  proce- 
dure to  minimize  bias  is  to  determine  the  maximum  number  of  new  runs  allowed, 
then  sequentially  select  one  run  at  a time  by  minimizing  F(x)  m times.  Any 
unconstrained  search  technique  could  be  used  to  minimize  F(x). 
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